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A global vision 


The International Council for Science needs to define its mission and show its members that it is 


worth their membership fees. 


relevant national or international professional society, then some of 
your cash probably goes to fund the ICSU. What is the ICSU? The 
acronym stands for the International Council of Scientific Unions, but 
the organization now calls itself the International Council for Science. 

If you are asking what it does with your money, that is a good ques- 
tion. The ICSU and others have been asking the same thing. 

The council has its secretariat in Paris, but in the past decade has 
opened regional offices representing Africa (based in Pretoria, South 
Africa), Latin America and the Caribbean (in Mexico City), and Asia 
and Pacific (in Kuala Lumpur, Malaysia). 

Dozens of national scientific organizations from around the world 
are members of the ICSU and pay dues for the privilege. But that num- 
ber will soon shrink by one. 

Members of the International Union of Biochemistry and Molecu- 
lar Biology (IUBMB) have decided to go it alone. The organization 
has told the ICSU that it has cancelled its membership, effective from 
1 January 2015. The IUBMB felt that it was not getting value for 
money: “The visibility of the ICSU on the international stage and its 
impact on science policy were considered insufficient to justify such 
expense,’ it said in its resignation letter in September. 

In an increasingly crowded marketplace for scientific bodies, the 
ICSU has to get its act together — and fast — if more of its members 
are not to follow suit. 

Angelo Azzi, a vascular biologist at Tufts University in Boston, 
Massachusetts, and past president of the IUBMB, says that it is not 
about the money — the IUBMB paid just €3,395 (US$4,240) in mem- 
bership fees to the ICSU this year — but about the principle. Other 
grievances that the organization listed in its resignation letter include 
a lack of transparency over internal committee appointments, dispro- 
portionate expenditure on internal meetings compared with scientific 
activities, and lack of involvement of young scientists. 

None of this would matter if the ICSU had not shown that it is 
capable of doing good things. It has — and they are worth paying for. 
Its flagship Future Earth programme, for instance, is a well-regarded 
global research platform for projects on sustainability. 

It just needs more such efforts. An external expert-review panel that 
analysed the ICSU’S operations and submitted its report in July, ahead 
of the ICSU general assembly in Auckland, New Zealand, got that feel- 
ing too. As well as having low visibility, the ICSU lacks a clear vision, 
the panel said. The ICSU posted the report on its homepage last week. 

In fact, the report criticizes most aspects of the ICSU’s operations. It 
offers a dire warning, saying that if the ICSU does not take its recom- 
mendations into account, “there is a serious risk that it will wither on 
the vine and become irrelevant over the next few years”. 

The recommendations are that the ICSU should define a vision, adopt 
a strategy and put in place a plan to achieve both through a limited 
number of flagship projects. The vision, it says, should distinguish the 
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ICSU from other worldwide scientific players, such as the InterAcademy 
Council and the LAP, a global network of science academies, as well as 
the Global Research Council created in 2012. Furthermore, the ICSU’s 
governance needs to become more transparent, and more inclusive 
of gender and diversity agendas. The regional offices, which get most 
of their financing from local sources, need to have much more clearly 
defined relationships with the ICSU’s secretariat, governance and execu- 
tive board. 


“In an The report also criticizes the lack of bal- 
increasingly anced representation of all sciences in the 
crowded ICSU’s activities, pointing out that biol- 
marketplace, the ogy does not get much of a showing. And 
ICSU has to get it notes that the recommendations of the 


its act together.” most recent previous review, back in 1996, 
have not been fully implemented. 

The ICSU’s president, climatologist Gordon McBean of Western 
University in London, Canada, says that the organization is taking 
the report very seriously. 

To be fair, the ICSU has a modest budget for a global organization: 
last year it brought in just €4.2 million. Much of that came from the 
subscriptions of its members, but €500,000 was provided by the French 
government. Still, as the report shows, getting the organization straight 
need not cost money. And scientists on the ground have the right to 
know what is being done in their name. m 


Save the museums 


Italy’s curators must band together to preserve 
their valuable collections. 


northern Italy. It was the end of the 1990s, and the university was 
finally starting to pay attention to its valuable but long-neglected 
zoological collections. 

Barbagli is passionate about birds, so he was distressed to find that 
the labels had fallen off 700 precious taxidermied specimens, devas- 
tating their scientific value. A well-intentioned but untrained staff 
member had decided to spruce up the collection, gifted to the univer- 
sity three decades earlier. He had painted the birds’ pedestals — onto 
which species names had been inscribed — and had fixed neatly typed 
labels to their feet with rubber bands. As any professional curator 
knows, rubber perishes. 

This story is emblematic of what has happened in historic scien- 
tific collections in universities and museums around Italy — some of 
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the oldest and most valuable in the world. Now, there is a chance to 
improve the situation. It must be taken. 

To preserve history, one must sometimes fight against it. Recent 
years have not been kind to such collections. When taxonomy went 
out of fashion in the 1970s, universities pushed aside physical speci- 
mens to make room for modern biology laboratories, and lost interest 
in paying for proper curatorship. Museologists in Italy estimate that 
at least one-third of all biological specimens — and items in other 
scientific collections such as geology or old physics instruments — 
have been lost to rotting or bad practice. 

The past decade of financial crisis has only made the situation 
worse. Many of the remaining specialized staff retired and were not 
replaced. Some important collections have no curators at all, including 
the Regional Natural History Museum of Terrasini in Sicily, home to 
10,000 stuffed birds and 1,500 entomological cases. The country has 
no professional courses that could train the next generation of cura- 
tors. Special funding for small museums is close to zero. 

Last month, Barbagli helped to organize a meeting of museum and 
scientific-collection experts in Rome, to work out how to turn the situ- 
ation around. He did not have to look too far. Collections in Germany 
have also suffered neglect, but researchers there seem to have a solution. 

German museologists organized themselves into a united front. They 
catalogued their collections and began a protracted lobbying campaign 
— until the Wissenschaftsrat, Germany’s national science-policy advi- 
sory body, understood what would be at stake if collections continued 
to be lost. In 2011, it issued a report that described collections as an 


Data-access practices 
strengthened 


Lc our continued drive for reproducibility, Nature and the Nature 
research journals are strengthening our editorial links with the 
journal Scientific Data and enhancing our data-availability prac- 
tices. We believe that this initiative will improve support for authors 
looking for appropriate public repositories for their research data, 
and will increase the availability of information needed for the 
reuse and validation of those data. 

In 2013, Nature journals introduced new editorial measures to 
promote reproducibility, and we continue to evaluate their impact 
and refine our policies. Our newly strengthened data-availability 
practices (go.nature.com/o5ykhe) reflect our preference that data 
be deposited in public repositories, and encourage researchers to 
expand on work published in the Nature journals by publishing 
further information in Scientific Data. 

Community-supported, specialized data repositories are usually 
the best way to share large data sets. General, unstructured reposi- 
tories, such as figshare and Dryad, provide options where no com- 
munity repository exists, and are preferable to publishing data as 
Supplementary Information. Supplementary materials have size lim- 
itations and do not always provide optimal file and viewing formats, 
particularly for large and complex data sets. But where no reposi- 
tory — or publication focused on detailed descriptions of data sets 
— exists, supplementary materials have often been the best option. 

Scientific Data (go.nature.com/iyu9qh), which launched this 
year, offers authors another way to maximize the value of their 
data sets for further research — for themselves and for the scientific 
community. 

Its primary article type, the Data Descriptor, provides more 
detail to improve the data’s discoverability, interpretability and 
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“indispensable basis” for research from anthropology and archaeology 
to geoscience and the history of art. This report — essentially declaring 
collections to bea valid research infrastructure — smoothed the way for 
change. A national coordination centre has now been established that 
offers resources and advice to any researcher, directing them to materials 
kept around the country. 


“Museologists Italian museologists have now started to 
estimate that at organize themselves in the same way, catalogu- 
least one-third ing collections. They have wisely decided not 
of all biological lose time asking their cash-strapped govern- 
specimens have ment for financing, but to call instead for a bet- 


been lost.” ter organization to protection their scientific 
heritage at a national level. 

In 2004, Italy legally recognized the value of its scientific heritage and 
placed it under the control of the ministry of culture, alongside objects 
of art. But that ministry lacked the scientific experts who might have 
established a meaningful protective organization. 

Responsibility for scientific heritage would be better embedded in 
the ministry for science. Ideally, small museums would organize into a 
network, grouped according to scientific field rather than location. This 
network would be headed by a few ministry officials who would make 
sure that resources and academic expertise are shared appropriately. 

Italian museologists should unite to push for such a structure, which 
would cost next to nothing but be highly effective. They need to move 
quickly, and to argue with a single voice. As their colleagues in Ger- 
many have shown, the rot can be stopped. = 


reusability — as well as allowing the highest credit to be given to 
the authors who created the data set. 

We are now rolling out a new process under which, when they 
accept a manuscript containing appropriate data sets, editors 
of Nature and Nature research journals will encourage authors 
to submit the data sets to Scientific Data as a Data Descriptor 
(go.nature.com/utfvfo). 

Authors may also submit a Data Descriptor manuscript along- 
side a manuscript for a Nature journal. If appropriate, they could 
publish the descriptor first, without compromising the novelty of 
future primary-research articles based on the data. In these cases, 
authors are encouraged to consult with the editor of their target 
journal to ensure that prior publication of a Data Descriptor is 
acceptable. (Note that other publishers may have different policies.) 

Scientific Data’s peer-review and in-house curation processes 
focus on ease of reuse. A data-curation editor reviews data files, 
checks their format, archiving and annotations, and works with 
authors to produce a standardized, machine-readable summary 
of the study in the ISA-Tab format (S. Sansone et al. Nature Genet. 
44, 121-126; 2012). 

Data Descriptors can accommodate all data types, including raw 
data and updated data sets generated after initial publication. They 
can also show the controls required for validation of the data set, 
which may have been excluded from the primary paper because of 
space limitations. Scientific Data’s editorial process assesses reposi- 
tories and helps to ensure that data are placed in the correct one. 
Nature’s enhanced data-availability policy now directs authors to 
a list of approved repositories (go.nature.com/jpm768). 

Several articles published in Nature research journals already 
have complementary articles in Scientific Data (such as A. Baud et al. 
Sci. Data 1, 140011 (2014) and F. Roquet et al. Sci. Data 1, 140028; 
2014). As science evolves and produces ever-increasing amounts 
of data, those data must be collected, organized, curated, quality- 
checked and made available on the right platform so that they can 
be easily discovered and reused. Stronger links with Scientific Data 
and our data-availability practices aim to achieve this. = 
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posturing of politicians, it is scientists who the public looks to 

in times of crisis and concern. The public still trusts scientists. 
A UK survey this year found that they trust scientists even if they do 
not always trust scientific information itself. Still, the public’s trust is 
fragile. Given how much scientists depend on public goodwill and 
the funding that flows from it, I am always surprised by how much 
scientists take the public’s trust for granted. They can — and should 
— do more to protect and nurture it. 

Trust in science is often discussed only in response to some scandal 
or controversy, such as misconduct. This is unfortunate. Such a focus 
on bad behaviour, equating concerns about trust with misconduct, 
can make scientists unwilling to discuss the issue because they feel 
personally criticized. As a result, they ignore or 
even resist calls (such as this one) to promote and 
improve the overall trustworthiness of research. 

Mishaps that cast science and scientists in a bad 
light and that could undermine trust are inevi- 
table, particularly because many fields of science 
are poorly understood by the wider public. It is 
down to scientists to identify and try to prevent 
such mistakes. 

Things can and do go wrong in science in 
countless ways, owing to the methods, technical 
procedures and complexity, which can make the 
most innocent of mistakes exceptionally difficult 
to detect. Too often, scientists do not consider the 
need for improvements because they are content 
with their faith that science self-corrects. This is a 
bad idea. Science's ability to weed out incorrect findings is overstated. 

There might once have been a time in science when there were 
multiple chances to ‘get it right. That is much less true today. Mod- 
ern scientific research is faster-moving and more connected, and the 
financial and reputational stakes are now much higher. The priority 
must be to try to get research right the first time, especially in bio- 
medical fields. We cannot afford to leave the detection of problems 
to chance. 

Simply following the rules that others set will not help scientists 
much either. Regulations often fail to solve the problems that give rise 
to them. The United States has strengthened conflict-of-interest regu- 
lations for biomedical researchers, for example, but this does nothing 
to address the potential that financial relationships between research 
sponsors and institutions have to cause bias, a particularly signifi- 
cant shortcoming considering the extent to which large universities 
treat their science divisions as money makers. 


Te Ebola crisis demonstrates once again that, despite all the 


Complying with rules also tends to fatigue the NATURE.COM 
research community on the one hand, andcon- __ Discuss this article 
tributes to a false sense of security that things are _ online at: 


being taken care of on the other. go.nature.com/ve7elo 


WE CANNOT EXPECT 
PEOPLE TO CALL 


ATTENTION TO 
PROBLEMS 


WHENITIS 


NOT SAFE 


FOR THEM T0 DO SO. 


Openness in science is key 
to keeping public trust 


Silence stifles progress, says Mark Yarborough. The scientific enterprise 
needs a transparent culture that actively finds and fixes problems. 


Scientists need to articulate better what makes their work deserving of 
the public’s trust in the first place. [hope that we can agree that research 
should satisfy three basic expectations: publications can consistently be 
relied on to inform subsequent enquiry; research is of sufficient social 
value to justify the expenditures that support it; and research is con- 
ducted in accordance with widely shared ethical norms. Making science 
more trustworthy then comes down to steps to make sure those expecta- 
tions are met. We needa culture that prevents and fixes mistakes not by 
chance, but by design. How can we create such a culture? 

One of the most important steps is to recognize and identify where 
standards break down. We need to routinely conduct confidential sur- 
veys in individual laboratories, institutions and professional societies 
to assess the openness of communication and the extent to which peo- 

ple feel safe identifying problems in a research 
setting. Some research institutions, to their 
great credit, are already conducting these kinds 
of assessments, but most do not. It is crucial that 
we start to make them the norm. 

We cannot expect people to call attention to 
problems when it is not safe for them to do so. 
At present, it is unsafe in too many research set- 
tings. Those who question the status quo can 
be ostracized and labelled as troublemakers. To 
make them safer, institution leaders must be pre- 
pared to hear unwelcome news and hold their 
nerve over bad publicity. And they must convince 
staff that their desire to improve is sincere. This is 
easier said than done, but the alternative is silence 
and stifled progress. 

Building on the results of these surveys, institutions should be 
open and declare errors and near-misses. They should make public 
the actions they take to correct situations, and whether they work. 

As science becomes less bound by both individual disciplines 
and geography, opportunities for errors and mistakes increase. One 
feature that we must better investigate is how distributing work 
among teams generates errors in data gathering and analysis. Unsta- 
ble reagents can perform differently at different sites, for example, 
and a stronger emphasis on quality assurance could help us to dis- 
cover and reduce any errors that might result from this. Unlike the 
call for surveys, which demands institutional buy-in, research teams 
could direct such efforts themselves, whether or not funders or uni- 
versities push them to do it. 

While science frets over misconduct and the bad apples in our 
midst, it fails to confront the bigger problems. We must make sure 
that we reward the public trust in scientists with trustworthy science. m 


Mark Yarborough is dean's professor of bioethics at the University of 
California, Davis, in Sacramento, California, USA. 
e-mail: mark. yarborough@ucdmc.ucdavis.edu 
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RESEARCH HIGHLIGHTS 


Exploding DNA 
goes back together 


The mysterious giant 
chromosomes found in some 
cancers are formed when DNA 
shatters and recombines. 

Neochromosomes are 
made up of pieces of the 
46 chromosomes that each 
human cell normally carries. 
To study how they form, a team 
led by Anthony Papenfuss 
at the Walter and Eliza Hall 
Institute of Medical Research in 
Melbourne and David Thomas 
of the Garvan Institute of 
Medical Research in Sydney, 
both in Australia, sequenced 
the DNA of neochromosomes 
isolated from liposarcomas. 

They used a mathematical 
model to show that certain 
cancer genes can drive normal 
chromosomes — in particular 
chromosome 12 — to break 
into pieces and reform as 
circles. The circles, which 
carry cancer genes, growin 
size as certain genes become 
amplified, and eventually 
split to form giant linear 
chromosomes. 

A drug targeting genes that 
drive this process could kill the 
cancer cells, the team proposes. 
Cancer Cell 26, 653-667 (2014) 


Mind manipulates 
gene expression 


Human brain activity has been 
harnessed to control gene 
expression in mice. 

Martin Fussenegger at 
the Swiss Federal Institute 
of Technology in Zurich 
and his colleagues created a 
small, implantable cartridge 
containing human cells 
engineered to produce a 
protein called SEAP when 
exposed to light. The 
researchers then put this 
cartridge under the skin of 


Selections from the 
scientific literature 


Twisty light sends images across Vienna 


Beams of light twisted into a corkscrew shape 
have carried data more than 3 kilometres 
over Vienna's skyline in an effort to increase 
the information-carrying capacity of 
electromagnetic waves. 

Adding orbital angular momentum (OAM) 
to laser beams — when fluctuations oflight 
waves are staggered along different parallel rays 
— can produce a theoretically infinite range of 
corkscrew patterns or modes. Mario Krenn and 


Anton Zeilinger at the University of Vienna and 
their colleagues used green laser light (pictured) 
with 16 different OAM modes to send data from 
a radar tower to a small detector across the city. 
They successfully transmitted small black-and- 


white pictures of Wolfgang Amadeus Mozart and 


other famous Austrians. The experiment showed 
that OAM modes can survive much longer trips 
through the atmosphere than expected. 

New J. Phys. 16, 113028 (2014) 


a mouse, along with a light- 
emitting diode (LED). 

When trained volunteers 
transmitted certain brain- 
activity patterns through a 
headset to a computer, the 
machine switched on an 
electrical-field generator 
under the mouse. The 
field powered up the LED 
implanted in the mouse, 
causing the cells in the implant 
to produce SEAP, which then 
passed into the bloodstream. 

The device could be 
programmed to respond to 
human brain activity that 
predicts a seizure, for example, 
and prevent the episode by 
delivering a drug to the brain, 
the authors say. 

Nature Commun. 5, 5392 (2014) 
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Eyespots shift 
predators’ attack 


Eye-shaped markings at the 
edges of butterfly wings stop 
predators from striking vital 
body parts. 

Kathleen Prudic, now at 
Oregon State University in 
Corvallis, and her team let 
praying mantids (Tenodera 
sinensis) feed on Bicyclus 
anynana butterflies, which 
have small, drab eyespots in the 
dry season and larger, brighter 
spots in the wet season. 

The mantids more readily 
detected wet-season butterflies 
than dry-season ones, but were 
less successful at capturing 


them because they tended to 
attack the wings rather than 
the body. Butterflies with wet- 
season wings lived longer and 
laid more eggs in the presence 
of mantids than did their dry- 
season fellows. 

Even dry-season butterflies 
with large bright spots pasted 
on their wings showed these 
fitness benefits. 

Proc. R. Soc. B 282, 20141531 
(2014) 


Molecular fan 
opens under light 


Researchers have constructed 
micrometre-sized, stacked 
layers that slide open like a 
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folding fan when illuminated. 

Yanke Che and his 
colleagues at the Beijing 
National Laboratory for 
Molecular Sciences created 
thin, ribbon-like structures up 
to one micrometre wide. 

The ribbons are composed of 
multiple layers, each consisting 
of pairs of a long, thin molecule 
called perylene diimide. Under 
a blue-green laser, the layers 
slide apart because the photons 
excite electrons and distort 
molecular conformations, the 
researchers say. Asa result, 
the ribbons expand, reaching 
around 12 micrometres in 
width after 3 minutes. They 
shrink back in seconds when 
exposed to an electron beam. 

Materials that change shape 
under light could have many 
applications, including in 
artificial muscle, the team says. 
Adv. Mater. http://doi.org/f2v7vc 
(2014) 


Leopard-skin 
origins traced 


DNA analysis can reveal the 
origins of products from 
endangered species, which 
could help to curb illegal trade. 
Such goods are often seized 
far from their origins, making 
it hard to know where to 
focus enforcement. Samrat 
Mondol of the National 
Centre for Biological Sciences 
in Bangalore, India, and his 
colleagues designed a DNA test 
that enabled them to trace the 
geographic origins of 40 seized 
leopard pelts (from Panthera 
pardus; pictured) to within 
a few hundred kilometres. 
They compared DNA from 
the pelts to that from blood 
and faecal samples taken 
from 173 leopards, focusing 
on gene variants found in 
certain locations in India. Very 
few of the skins were local to 
their seizure point. Central 
India appears to be a leopard 


poaching hotspot. 

The technique could easily 
be used for other traded 
species, the authors say. 
Conserv. Biol. http://doi.org/w5s 
(2014) 


Beware tainted 
microbe studies 


DNA contamination is 
ubiquitous in laboratory 
reagents commonly used to 
analyse the microbes that 
inhabit the human body. 

Susannah Salter at the 
Wellcome Trust Sanger 
Institute in Hinxton, UK, 
Alan Walker at the University 
of Aberdeen, UK, and their 
colleagues used off-the-shelf 
DNA-extraction kits and 
two common techniques to 
sequence a pure culture of the 
bacterium Salmonella bongori 
as well as a series of diluted 
versions. Contamination 
by other bacterial species 
increased with each dilution, 
and quickly drowned out the 
original S. bongori signal. 

The team traced at least part 
of the problem to the DNA- 
extraction kits, which are not 
sold as sterile. 

This contamination could 
undermine microbiome 
studies, especially in samples 
that have low microbial 
content, including those from 
spinal fluid, blood and the 
lungs, the authors say. 

BMC Biol. 12, 87 (2014) 


ASTRONOMY 


Merged stars 
dodge black hole 


A mysterious cloud-like object 
that survived a close encounter 
with a black hole might bea 
merged pair of stars. 

Andrea Ghez of the 
University of California in Los 
Angeles and her team used 
the Keck telescopes on Mauna 
Kea in Hawaii to observe the 
object, called G2. In March, 
it was nearly engulfed by our 
Galaxy’s central supermassive 
black hole. 

Previous observations 
using specific wavelengths 
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Popular articles 
on social media 


Unusual reference attracts notoriety 


An editorial oversight has turned a report on fish 
pigmentation into one of the year’s most talked-about papers. 
The study of poeciliid fishes, first published online in July by 
the journal Ethology, received scant attention until ecologist 
David Harris at the University of California, Davis, tweeted 

a screenshot of one of its pages, highlighting this phrase in 
parentheses: “Should we cite the crappy Gabor paper here?” 
Harris added his own comment on Twitter: “Not sure how 
this made it through proofreading, peer review and copy 
editing, In one of dozens of responses, Tim Elfenbein, 
managing editor of the journal Cultural Anthropology, 
tweeted: “Note to authors: you are ultimately responsible for 
the work that bears your name, no matter the level of editing” 


Ethology 120, 1090-1100 (2014) 


Based on data from altmetric.com. 
Altmetric is supported by Macmillan 
Science and Education, which owns 
Nature Publishing Group. 


of light indicated that it was 

a young cloud of gas, which 
would have been stretched or 
devoured by the black hole. 
But the team’s infrared images 
showed no clear change in 
G2’s appearance. Instead, the 
researchers suggest that the 
object is a pair of stars that 
have recently merged, perhaps 
owing to the presence of the 
black hole. 

The black hole’s gravity could 
be disrupting the dynamics of 
nearby binary systems, causing 
them to coalesce, according to 
the authors. 

Astrophys. J. Lett. 796, L8 (2014) 


Water vapour 
predicts flooding 


Streams of concentrated water 
vapour in the atmosphere 
could be used to predict 
flooding in Europe more 
accurately than rainfall does. 
A team led by David Lavers 
of the European Centre for 
Medium-Range Weather 
Forecasts in Reading, UK, 
looked at forecasts from last 
winter, when the United 
Kingdom and other parts of 
Europe saw major flooding 


NATURE.COM 
For more on 

popular papers: 
go.nature.com/3bswat 


(pictured). By incorporating 
information on the transport 
of water vapour in the 
atmosphere, the team found 
that scientists could have 
predicted flooding in some 
areas of Europe by up to three 
extra days. 

The weather patterns 
associated with these 
atmospheric rivers do not 
break apart as rapidly as 
rainfall-related patterns do, 
making them more reliable 
flood predictors, the team says. 
Nature Commun. 5, 5382 (2014) 


© NATURE.COM 

For the latest research published by 
Nature visit: 
www.nature.com/latestresearch 
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Climate deal 

China and the United 

States announced plans to 
substantially reduce their 
greenhouse-gas emissions 

at a summit in Beijing on 

12 November. US President 
Barack Obama pledged to cut 
emissions to 26-28% below 
2005 levels by 2025; Chinese 
President Xi Jinping said 

that his country will stop its 
emissions from growing by 
2030. The joint announcement 
is expected to facilitate 
discussions of a global climate 
agreement — a successor to 
the 1997 Kyoto Protocol — to 
be finalized in December 2015 
at United Nations climate talks 
in Paris. See Nature http://doi. 
org/w5f (2014) for more. 


No science advice 
The European Commission 
has abolished the post of chief 
scientific adviser three years 
after creating it. The mandate 
of the outgoing adviser, UK 
biologist Anne Glover, ended 
last month together with 

the previous commission's 
term. The new commission 
has decided to abolish the 
post, Glover told colleagues 
by e-mail on 12 November. 
Incoming commission 
president Jean-Claude 
Juncker has said that he values 
scientific advice, but has yet 

to decide what form it should 


take. See go.nature.com/ 
bvfkmy for more. 


Misused money 


Sandia National Laboratories 
in Albuquerque, New Mexico, 
wrongly used public money 
to lobby the US government 
to continue a contract with 
the defence research firm 
Lockheed Martin, according 
to the energy department's 
Office of Inspector General. 
The office’s report, released 
on 12 November, found that 
the laboratory used taxpayer 


* touchdown 
-© point 


Bouncing on acomet 


This image, taken from the European 

Space Agency’s Rosetta spacecraft, captures 
the Philae lander as it drifted down onto 
comet 67P/Churyumoy-Gerasimenko 

on 12 November — rebounding as high as 

1 kilometre after its first touchdown. After a 


funds to convince federal 
officials and the US Congress 
to extend a contract under 
which a Lockheed Martin 
subsidiary manages the lab 
on behalf of the government 
for around $2.4 billion per 
year. The firm was non- 
competitively awarded a two- 
year extension in March 2014. 


Data breach 


Hackers have compromised 
four websites run by the 

US National Oceanic and 
Atmospheric Administration 
(NOAA) in recent weeks. The 
agency confirmed the breaches 
last week, but would not 
publicly discuss the suspected 
origin of the attacks or which 
data had been affected. US 
congressman Frank Wolf 
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(Republican, Virginia) said 
that the agency told him that 
the Internet attacks came from 
China. NOAA reported that 
all services had been fully 
restored, and that delivery of 
weather forecasts to the public 
was not interrupted. 


Biodiversity boost 


The world’s best-managed 
conservation sites have been 
recognized in a Green List, 
unveiled on 14 November at 
the World Parks Congress 
in Sydney, Australia. The 

23 protected areas on the 
list, which offer the most 
favourable conditions for 
flora and fauna, include the 
Galeras wildlife sanctuary 
in Colombia and the area 
around Mount Huangshan in 


second bounce, the lander came to rest in the 
shadow ofa cliff, from where it took data for 
three days before its batteries ran out of power. 
Philae may wake again if sufficient sunlight falls 
on its solar panels as the comet moves closer to 
the Sun. See page 319 for more. 


China. The sites were picked 
by the International Union for 
Conservation of Nature, based 
in Gland, Switzerland, which 
has for 50 years maintained a 
Red List of threatened species. 
The latest edition of that list 
was also published at the 
Sydney congress. See page 322 
for more. 


Pharma takeover 


Drug firm Actavis said that 

it would become one of the 
world’s leading pharmaceutical 
companies with a US$66- 
billion cash-and-share 
takeover of health-care firm 
Allergan. Actavis, which is 
headquartered in Dublin, said 
that the deal, announced on 


ESA/ROSETTA/MPS FOR OSIRIS TEAM MPS/UPD/LAM/IAA/SSO/INTA/UPM/DASP/IDA 
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SOURCE: OECD 


17 November, would create 

a firm with revenues of more 
than $23 billion next year — on 
a par with 2013’s tenth largest 
pharmaceutical firm, Eli Lilly 
of Indianopolis, Indiana. 
Allergan, in Irvine, California, 
is a leading manufacturer of 
breast implants and anti- 
wrinkle toxin Botox; it had 
been fighting off a takeover 
bid by Canadian firm Valeant 
Pharmaceuticals. 


Caltech lawsuit 


Physicist Sandra Troian is 
suing the California Institute 
of Technology (Caltech) in 
Pasadena, where she is a faculty 
member, alleging that the 
university retaliated against 
her for reporting suspicions 
about possible espionage by 

a postdoctoral scholar from 
Israel. The 13 November 
lawsuit claims that the 
university impeded her career 
by falsely accusing her of 
research misconduct relating 
to authorship attribution, and 
by denying her grant funding, 
among other things. Caltech 
calls the lawsuit “meritless”. 


EU agency headless 
The London-based European 
Medicines Agency no longer 
has an executive director, after 
a tribunal overturned the 
appointment of Guido Rasi 
(pictured). Rasi has led the 
drug-evaluation agency since 


TREND WATCH 


If current trends continue, China 


will overtake the United States 
in research and development 
(R&D) spending by the end 
of the decade, according toa 
12 November biennial report 
from the Organisation for 
Economic Co-operation and 
Development (OECD). But 


China spends much of its R&D 
budget on building infrastructure, 
so less money goes into research 
than in other countries, says the 
OECD's Dominique Guellec. See 
Nature http://doi.org/w5r (2014) 


for more. 


late 2011, but shortly after he 
was appointed, Emil Hristov, 

a former head of Bulgaria's 
drug agency, appealed against 
the decision after not being 
shortlisted for the job. The 
agency says that the ruling is 
about “a procedural formality” 
and is taking legal advice. 


PubPeer brawl 


PubPeer, a website for 
discussing science articles, will 
contest legal action brought 
by a scientist who claims 

that anonymous comments 
about his work made on the 
site are defamatory, it said 
last week. Fazlul Sarkar, a 
cancer researcher at Wayne 
State University in Detroit, 
Michigan, had accepted a 
tenured post at the University 
of Mississippi in Oxford, but 
the university withdrew its 


offer after it saw the comments. 


He filed a lawsuit against the 
unknown commenters on 
9 October, and subpoenaed 


ASCENDING DRAGON 


PubPeer to reveal information 
about their identities. But 

the website's lawyers told 
Nature that its owners will 
fight the subpoena by arguing 
that the comments were not 
defamatory. See http://doi.org/ 
w68 for more. 


| FUNDING 
Petaflop power 


Two US laboratories have 
ordered IBM supercomputers 
that will become the nation’s 
fastest when they come online 
in 2017. The machines together 
cost US$325 million and will 
run at up to 150 petaflops 
(150x 10" floating-point’ 
operations per second), more 
than five times faster than 

the Titan system at the Oak 
Ridge National Laboratory 

in Tennessee. Oak Ridge will 
get one of the computers; the 
other will be at the Lawrence 
Livermore National Laboratory 
in California. China’s National 
Supercomputer Center in 
Guangzhou has the world’s 
leading system, Tianhe-2, at 
55 petaflops. See page 324 for 
more. 


US nuclear woes 


The US military may need 
to spend billions of extra 
dollars on its nuclear- 
weapons programme after 
two review panels found 
that it is plagued with low 
morale, ineffective oversight 
and ageing infrastructure. 


China’s total research and development (R&D) budget looks set to 
overtake that of the United States by 2019. 


700 = 


R&D spending (US$ billions*) 


2000 2005 


*2005 dollars, based on purchasing power parity. 


= United States 
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= China 
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SEVEN DAYS | THIS WEEK | 


19-20 NOVEMBER 
The Green Climate 
Fund — an international 
agreement for 
channelling money to 
developing countries for 
climate change — holds 
its first pledging 
conference in Berlin. 
go.nature.com/ybq92e 


24 NOVEMBER 

The deadline set by 
international negotiators 
in Vienna to agree a deal 
with Iran on curbing 

its nuclear programme. 
A temporary pact was 
agreed a year ago (see 
Nature 503, 442; 2013). 


In one case, three US bases 
housing 450 nuclear weapons 
were forced to share the only 
wrench capable of attaching 
warheads to missiles, sending 
the tool to each other using 
the courier FedEx. US defence 
secretary Chuck Hagel said on 
14 November that spending 
needs to increase by around 
10% over the next half-decade. 
The defence department's 2014 
budget for nuclear forces is 
around US$15 billion. 


Green climate fund 
The Green Climate Fund, 

an international agreement 
for channelling money to 
developing countries to 

help them adapt to climate 
change, received a landmark 
boost from leaders at the G20 
Summit last week in Brisbane, 
Australia. US President Barack 
Obama and Japanese Prime 
Minister Shinzo Abe pledged 
to contribute US$3 billion 

and $1.5 billion, respectively, 
to the fund, which is holding 

a pledging conference on 
19-20 November in Berlin. 
Established in 2010, the fund 
has now received pledges 


from 13 nations, totalling 
$7.5 billion. 


> NATURE.COM 
For daily news updates see: 
www.nature.com/news 
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NEWS IN FOCUS 


Crisis- Complete human 
mappers turn to citizen genome sequence takes Supercomputer trio will 
scientists p.321 shape p.323 turbocharge science p.324 


fi Research 
overheads vary widely at 
US universities p.326 


ESA/ROSETTA/PHILAE/CIVA 


rR \ MISSION 


Philae’s 64 hours of science 


Comet lander is now hibernating, but has already altered our understanding of these objects. 


BY ELIZABETH GIBNEY 


image,” says Holger Sierks. The photo 

is of a metallic, robotic leg against the 
rugged surface of comet 67P/Churyumov- 
Gerasimenko. For Sierks, principal investi- 
gator of the OSIRIS camera on the Rosetta 
spacecraft, which put the robotic lander on 
the comet, it is the “image of my life”. 

The European Space Agency (ESA) 
mission made history on 12 November when 
the three-legged Philae probe landed on 
Churyumov-Gerasimenko, which is 4 kilome- 
tres in diameter, travels at more than 60,000 kil- 
ometres per hour and is currently 514 million 


cc [= goose bumps talking about this 


kilometres from Earth. After a nail-biting three 
days in which the elation of Philae’s touchdown 
gave way to fears about its power levels after it 
ended up ata site almost devoid of sunlight, the 
lander went into a potentially terminal standby 
on 15 November, its batteries drained. 

But that was not before Philae gave each 
of its ten instruments a chance to gather and 
transmit data. Although the plan was for Philae 
to still be collecting data now, powered by its 
solar panels, findings from just 64 hours of 
scientific activity are already changing the way 
that scientists view comets. 

Twice a day, Philae had a contact window 
of 3-4 hours in which to communicate with 
mission control through the Rosetta orbiter. 


That was enough to achieve 90% of what 
scientists had hoped for, says Monica Grady, 
a co-investigator on Philae’s chemical 
analyser, Ptolemy. And for some instruments, 
the lander’s unplanned bounces across the 
comet surface — which saw it end up ina 
shady spot — might actually have spawned 
data that are more interesting than anticipated. 
Philae’s dramas began the night before the 
scheduled landing, with computing problems. 
A reboot fixed those, and the team decided 
to go ahead despite a second issue with the 
lander’s thrusters, which were intended to press 
Philae into the comet’s surface until it secured 
itself. But then another mechanism, the har- 
poons that were intended to securely attach 
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> the lander, also failed to fire on touchdown. 
Even as champagne corks popped at the Euro- 
pean Space Operations Centre in Darmstadt, 
Germany, ESA scientists were unaware that 
Philae was already rebounding. It bounced 
twice — once rising as high as 1 kilometre 
above the comet’s rotating surface — before 
the weak gravity, under which a craft weigh- 
ing 100 kilograms on Earth would weigh just 
1 gram, eventually brought the lander to rest. 
Philae originally hit the flat, sunny region 
that had been carefully selected as its landing 
spot, but after the acrobatics, it ended up 1 kilo- 
metre away on its side, with one leg raised off 
the surface, in the shadow of a rocky-looking 
cliff face. From this inelegant position, where 
it received just 1.5 hours of sunlight in every 
12.4-hour comet rotation, it did not have 
enough power to charge its secondary batteries. 


COMET RELIEF 

Despite the bumpy landing, Philae’s 64 hours 
of activity pulled in a haul of good data, which 
are still being processed. The first panoramic 
pictures from its CIVA (Comet Nucleus Infra- 
red and Visible Analyser) camera show a 
surface covered in dust and debris, with rock- 
like materials in a range of sizes. “It's certainly 
rougher than what we thought,” says Stephan 
Ulamec, Philae project manager at the German 
Aerospace Center (DLR) near Cologne. 

Data from another instrument, MUPUS 
(Multi-purpose Sensors for Surface and 
Sub-Surface Science), which includes a 
Coke-can-sized hammer mechanism atop a 
40-centimetre-long rod to probe the comet's 
surface, revealed a surprise: the comet seems to 
have hard ice underneath a 10-20-centimetre 
layer of dust, into which the hammer could not 
probe. “We were expecting a softer layer, with 


\ Ls 


Mission scientists celebrated Philae’s separation. 


a consistency like compact snow, or maybe 
chalk,” says the DLR’s Tilman Spohn, principal 
investigator for MUPUS. 

The hardness of this sub-surface will, along 
with temperature measurements, help scien- 
tists to piece together how the comet's coma 
of gas and dust forms. But it will have to be 
reconciled with the low density of the comet, 
Spohn says. It could be that the ice is porous, 
or that the hardness is specific to the cold, dark 
region where Philae came to rest. 

Another instrument on the lander, ROMAP 
(Rosetta Magnetometer and Plasma Monitor), 
probably benefited from Philae’s two bounces. 
ROMAP will help to answer whether the comet 
has its own magnetic field — which could have 
ramifications for models of planet formation 
— and how the ionized gas that envelops the 
comet changes near its surface; the bounces 
mean extra data points. “If someone designed 


ASTEROIDS ON THE AGENDA 


World eyes up Europe’s comet lander 


As their European colleagues put a lander 
on acomet, US space scientists were thrilled 
— and alittle envious. “It was not perfect, 
but it was amazing,” says Jessica Sunshine, 
who studies comets at the University of 
Maryland in College Park. 

Sunshine’s team designed a ‘comet 
hopper’ that would have used nuclear 
batteries to jump slowly across a comet’s 
surface, but NASA declined to fund it 
in 2012. Now the team is working on 
an alternative proposal to build on the 
questions that Philae is starting to raise. 

But first, the focus is shifting to asteroids. 
On 30 November, the Japan Aerospace 
and Exploration Agency plans to launch its 
Hayabusa-2 mission to the asteroid 1999 
JU3, which will carry, among other things, a 
Philae-like lander. In September 2016, NASA 


aims to launch the OSIRIS-REx probe, which 
will use a robotic arm to vacuum up samples 
from the asteroid Bennu, for return to Earth. 
Rosetta scientists spent several months 
studying their comet before deciding where 
they would touch down; the OSIRIS-REx 
team plans to do the same. “One of the hard 
things about going to these bodies is that 
we don’t know what they look like,” says 
principal investigator Dante Lauretta, of the 
University of Arizona in Tucson. 
Congressman Lamar Smith (Republican, 
Texas), who heads the House of 
Representatives committee that oversees 
science and space issues, notes that Rosetta 
launched more than a decade ago. “We 
must make long-term commitments today,” 
he says, “if we want to ensure successes in 
space in the future.” Alexandra Witze 


320 | NATURE | VOL 515 | 20 NOVEMBER 2014 


© 2014 Macmillan Publishers Limited. All rights reserved 


a mission for magnetometers, and he was a 
very creative person, he would have done it 
exactly like that,” says Uli Auster, ROMAP’s 
co-principal investigator. 

Shortly after touchdown, organic molecules 
were detected in samples of the comet's surface, 
courtesy of COSAC, the Cometary Sampling 
and Composition experiment. It is designed 
to probe for such molecules and test whether 
their handedness, or chirality, matches with 
chemical signatures on Earth. But COSAC had 
to wait until the final hours of Philae’s battery 
life before attempting to probe the sub-surface 
because of fears that the drill action would 
cause the unanchored lander to tip over. After 
mission control finally gave the signal to bore 
down, Philae was able to send back data, which 
the COSAC team are now scouring for mol- 
ecules, says co-investigator Uwe Meierhenrich, 
an analytical chemist at the University of Nice 
Sophia Antipolis in France. 

Low power meant that the Ptolemy instru- 
ment, which is designed to analyse chemicals 
and the relative abundance of isotopes, did 
not get a chance to study a sub-surface sample. 
But the team is cautiously optimistic about its 
surface measurements. If they are lucky, by com- 
parison with data from Earth, both Ptolemy and 
COSAC could help to reveal whether comets 
brought substances to Earth that are necessary 
for life, such as amino acids and water. And like 
the magnetic measurements, Ptolemy could 
benefit from Philae’s cross-comet journey. “It’s 
a possibility that we got samples from at least 
two, possibly three, landing sites,’ says Grady. 

More data could yet arrive. Before Philae 
shut down, the team instructed it to turn about 
35 degrees and lift its body by 4 centimetres 
to bring the craft’s largest solar panel into the 
light. It could wake up if warming conditions 
allow it to generate enough power to restart as 
the comet gets closer to the Sun. 

In August, the comet reaches perihelion, 
its closest point to the Sun, and it will become 
“active like hell’, says Sierks. The shade that shut 
down the lander in its first days may become its 
welcome parasol, says Meierhenrich. “Now it 
may survive much longer than March. Maybe 
in April, May or June we might regain contact.” 

Rosetta is designed to study Churyumov- 
Gerasimenko over the coming months as it 
swings around the Sun and journeys back out 
into space, and there is now a chance that even 
Philae will be able to operate then too. 

As wellas sealing ESAs place in history, Roset- 
ta’s success could bring greater spoils (see ‘Aster- 
oids on the agenda’). The science programme 
that funds Rosetta is not up for discussion at 
ESAs ministerial meeting on 2 December, but 
member states might now be more willing to 
part with cash for discovery projects, says ESAs 
senior science adviser, Mark McCaughrean. 
“In the two weeks before landing, there were 
concerns that if it didn’t work, that would have 
damaging effect;” he says. “We would certainly 
hope it would work the other way around” = 
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DISASTER RESPONSE 
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Crisis mappers find an ally 


Crowdsourced disaster surveys strive for more reliability by using online citizen scientists. 


BY MARK ZASTROW IN NEW YORK CITY 


hen Typhoon Haiyan barrelled into 
the Philippines on 8 November 
2013, more than 1,600 volunteers 


leapt to their laptops to make 4.5 million edits 
to OpenStreetMap, an online, open global map. 
Working from satellite imagery, the volunteers 
created maps for stricken areas of the islands, 
and tagged buildings that seemed to have been 
damaged or destroyed. The maps were used to 
help aid workers to navigate the terrain, and 
the damage assessments were passed to relief 
organizations to direct aid workers and supplies. 

Although the maps proved invaluable, the 
damage assessments were poor. “The results 
were terrible,” Dale Kunce, a geospatial engi- 
neer at the American Red Cross, told the 
International Conference of Crisis Mappers in 
New York City on the anniversary of Haiyan’s 
landfall. Crisis mappers see the experience not 
as a setback but as a valuable lesson. The take- 
home message, Kunce said, “is that if we'd done 
a couple things differently, the quality would 
have been much higher”. 

The effectiveness of compiling geographic 
information about disasters online was first 
demonstrated on a large scale after the Haiti 
earthquake in January 2010. An informal net- 
work of volunteers began noting the status of 
buildings and infrastructure on an online map 
using news and social-media reports, and later 
incorporated text messages from survivors on 
their status and needs. Craig Fugate, head of the 
US Federal Emergency Management Agency, 
called the Haiti effort “the most comprehensive 
and up-to-date map available to the humanitar- 
ian community”. 

But an analysis of the Typhoon Haiyan data 
in April by the American Red Cross and the 
Reach Initiative, a humanitarian agency based 
in Geneva, Switzerland, made a disheartening 
finding: satellite judgements by the Humani- 
tarian OpenStreetMap Team (HOT), an online 
group of volunteer crisis mappers, matched a 
later ground survey only 36% of the time. Vol- 
unteers tended to miss structural damage in 
most areas, but overestimated it in the densely 
populated city of Tacloban. The report con- 
cluded that current satellite imagery does not 
offer enough detail to allow relatively untrained 
volunteers to assess damage. 

Now the community is developing better 
ways to assess and verify damage in real time. 
Some of the most promising advances are com- 
ing from collaborations with another crowd- 
sourcing movement that has sprung up in the 


Residents of Tacloban in the Philippines burn scrap wood in the aftermath of Typhoon Haiyan. 


past few years: citizen science. This lets anyone 
with an Internet connection volunteer to do 
labour-intensive tasks requiring little or no 
expertise for academic research projects. 

The Haiyan report identified several ways in 
which online crowdsourcing platforms could 
make satellite assessments more dependable: 
by giving volunteers better guidance on what 
features to look for, providing pre-disaster 
imagery to compare against and improving 
assessments of volunteers’ accuracy. 

When astronomer Brooke Simmons of the 
University of Oxford, UK, read the report, 

she realized that she 


“It brings us to already knew how to 
the next level — do those things — in 
and where we a different setting. She 


studies the evolution 
of galaxies with the 
Zooniverse, the world’s largest citizen-science 
project, in which 1.2 million users pore over 
old ships’ logs to extract weather data, scan 
astronomical images for interesting objects and 
transcribe scraps of ancient texts. 

“The Zooniverse has been doing this longer 
than anyone else,’ says Patrick Meier of the 
Qatar Computing Research Institute in Doha, 
who leads the Standby Task Force, a crisis- 
mapping team that tracks social-media posts, 
and who has admired the Zooniverse for years. 

Meier and Simmons hope to launch a pilot 
study within weeks using archival images from 
Tacloban. With the help of HOT leader Kate 


should be.” 


Chapman, they have secured the release of 
post-Haiyan images taken with drones made 
by CorePhil of Quezon City in the Philippines, 
as well as high-resolution pre-disaster satel- 
lite images from DigitalGlobe of Longmont, 
Colorado. Those images will be degraded in 
steps to simulate the more-limited resolution 
of other satellites, and volunteers will be asked 
to use them to identify damaged structures. 
The goal is to determine what damage can be 
seen at what resolution. Meier hopes that up to 
100,000 Zooniverse users will participate. 

Simmons is also developing ways to statisti- 
cally quantify how confident relief workers can 
be ina building’s damage ranking, weighting 
users’ input on the basis of how accurate they 
have been in the past. “It brings us to the next 
level — and where we should be,” says Meier. 

Meier and Simmons hope that, by spring 
2015, a Zooniverse portal will be ready to deal 
with real-world crises, providing aid workers 
with an interactive map that conveys not just 
the level of damage, but also the confidence in 
those assessments. 

Imagery will be provided for free by Planet 
Labs in San Francisco, California, which 
is launching small satellites to image the 
entire Earth every 24 hours at a resolution of 
3-5 metres. Although that is less detailed than 
imagery from some commercial providers, the 
comprehensive coverage will ensure that pre- 
and post-crisis imagery is available wherever 
the next major disaster strikes. m 
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Australian fur seals swim in protected waters near Montague Island in Australia. 
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Green List promotes 
conservation hotspots 


Project pinpoints protected reserves that boost biodiversity. 


BY NATASHA GILBERT 


onservation groups often highlight 
( species or ecosystems at risk. An effort 
launched on 14 November turns that 
approach on its head, seeking for the first time 
to systematically recognize the world’s best- 
managed protected areas, which offer the most 
favourable conditions for flora and fauna. 
The International Union for Conservation 
of Nature (IUCN) unveiled its Green List 
of 23 sites at the World Parks Congress in 
Sydney, Australia. The group, based in Gland, 
Switzerland, has long maintained a Red List 
of threatened species, which scientists and 
governments use as one way to estimate pro- 
gress towards various biodiversity goals. 
By some measures, global conservation 
efforts are succeeding. In 2010, the inter- 
national Convention on Biological Diversity 


> 


MORE 
ONLINE 


(CBD) set a goal of protecting 17% of Earth’s 
land surface and 10% of its oceans by 2020. 
Currently, 15.4% of land areas and 3.4% of 
oceans are set aside as protected areas, accord- 
ing to figures released on 13 November by the 
United Nations Environment Programme. 
But not all conservation areas are created 
equal. For example, Australia’s extensive net- 
work of marine reserves — which includes 
the Great Barrier Reef — has had very little 
impact on marine conservation, researchers 
reported in Aquatic Conservation in Febru- 
ary (R. Devillers et al. Aquat. Conserv. http:// 
doi.org/w6w; 2014). This is because many 
reserve locations were chosen to avoid dam- 
aging commercial interests, rather than to 
best protect areas of ecological importance, 
the study found. “Protected areas are of no 
use if they are not managed or governed prop- 
erly,’ says James Hardcastle, who is leading the 
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Green List project for the IUCN. 

In addition, research published on 
14 November in Nature suggests that creating 
protected areas is not enough to safeguard the 
future of plant and animal life (E M. Pouzols 
et al. Nature http://doi.org/w6x; 2014). These 
secured zones currently cover just 19% of the 
habitat of the planet’s terrestrial vertebrate 
species, the study finds. That share could 
triple if the world achieves the 2020 CBD 
conservation target. But land-use changes, 
such as expanding agricultural zones, threaten 
to erode biodiversity. If current trends con- 
tinue, the ranges of almost 1,000 threatened 
species could be halved by 2040. 

Federico Montesino Pouzols, a bioinforma- 
tician at the Rutherford Appleton Laboratory 
in Harwell, UK, and an author of the study, 
says that international collaboration — such 
as the Green List — is essential to create effec- 
tive protected areas. 

The IUCN approved the Green List concept 
in 2012, at its World Conservation Congress 
in Jeju, South Korea. The group asked govern- 
ments to nominate sites for inclusion. These 
were then judged using 20 criteria, such as 
whether a site focuses on protecting species 
only within its boundaries or whether it takes 
a broader approach — for example, by consid- 
ering the health of a species over its full range. 

In the end, the IUCN accepted 23 of 27 can- 
didate sites. The successful sites include the 
Mount Huangshan scenic area in China, 
which was praised for its management of the 
throngs of tourists that visit every year, and the 
Galeras wildlife sanctuary in Colombia, cited 
for a design that captures the region’s varied 
terrain, such as a volcanic complex, mountain 
forests and lowland valleys. 

Green List sites are also judged on how they 
treat people who have historically lived in or 
used the land — addressing human-rights 
advocates’ concerns that protected areas often 
exclude indigenous people. 

This exclusion is still happening in some 
areas. For example, in 2010 the United King- 
dom set up a marine reserve around the 
Chagos Islands in the Indian Ocean. The 
islands original inhabitants, who were evicted 
in the early 1970s to make way for a US mili- 
tary base, are effectively barred from accessing 
the area by protected-area restrictions. 

“This is one site that won't be getting on to 
the Green List for a while,’ says Hardcastle. m 
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‘Platinum’ genome shapes up 


Disease sites targeted in assembly of more-complete version of the human genome sequence. 


BY EWEN CALLAWAY 


eneticists have a dirty little secret. More 
than a decade after the official comple- 
tion of the Human Genome Project, and 
despite the publication of multiple updates, the 
sequence still has hundreds of gaps — many in 
regions linked to disease. Now, several research 
efforts are closing in ona truly complete human 
genome sequence, called the platinum genome. 

“It’s like mapping Europe and somebody 
says, ‘Oh, there's Norway. I really don’t want to 
have to do the fjords,’ says Ewan Birney, a com- 
putational biologist at the European Bioinfor- 
matics Institute near Cambridge, UK, who was 
involved in the Human Genome Project. “Now 
somebody’s in there and mapping the fjords.” 

The efforts, which rely on the DNA from 
peculiar cellular growths, are uncovering DNA 
sequences not found in the official human 
genome sequence that have potential links 
to conditions such as autism and the neuro- 
degenerative disease amyotrophic lateral 
sclerosis (ALS). 

In 2000, then US President Bill Clinton 
joined leading scientists to unveil a draft 
human genome. Three years later, the project 
was declared finished. But there were caveats: 
that human ‘reference’ genome was more than 
99% complete, but researchers could not get to 
100% because of method limitations. 

Sequencing machines cannot process 
entire chromosomes, so scientists must first 
make many identical copies of the DNA and 
cut them into short stretches, with the breaks 
in different places. After sequencing, a com- 
puter program looks for overlapping patterns 
to ‘stitch’ the resulting segments back together. 

This approach worked for most of the 
genome, because DNA sequences are almost 
identical across its three billion ‘letters’ (the As, 
Cs, Ts and Gs). But in some parts, big differences 
exist between the versions of chromosomes that 
an individual inherits from the mother and 
father. Attempts to stitch together these regions 
to sequence the DNA led to gaps when the differ- 
ing sequences gave conflicting solutions. 

The problem can be likened to assembling a 
single jigsaw puzzle from the mixed-up pieces of 
similar, but not identical, puzzles. Ifone puzzle 
piece is identical across the sets, any copy of it 
will do. Butifone set contains a much larger ver- 
sion of the matching piece, or if a piece is miss- 
ing, the puzzle will not fit together. In particular, 
long, repetitive stretches near genes vexed the 
computer algorithms used to analyse the data. 
And the problem was made worse because DNA 


TO SIMPLIFY A SEQUENCE 


To produce uninterrupted DNA sequences of human chromosomes, geneticists are turning to hydatidiform 
moles. These are formed when a sperm cell enters an egg that has lost its nucleus, making it non-viable. 
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from multiple people was used, adding to the 
variation between the genomes. 

As a result, when a person’s genome is 
sequenced — for instance, to look for the cause 
ofa disease — crucial bits of DNA may be over- 
looked because they do not have counterparts in 
the published genome. “There's a whole level of 
genetic variation that we're missing,” says Evan 
Eichler, a genome scientist at the University of 
Washington in Seattle, a leading proponent of 
the platinum-genome efforts. To plug the gaps, 
researchers need a supply of human cells with 
just a single version of each chromosome, to 
remove the possibility of conflicting solutions 
— asingle set of puzzle pieces, in other words. 

Sperm and egg cells contain a single copy 
of each chromosome, but these cells cannot 
divide and produce copies of themselves. So 
in recent years, geneticists have turned to cells 

from growths called 
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we’remissing.” is missing its own 
genetic material (see 
“To simplify a sequence’). The fertilized cell 
copies its genome and starts dividing, just as 
the cells in a normal fertilized egg would. The 
resulting ball of cells, which is usually removed 
in the first trimester of pregnancy, contains 
identical copies of each human chromosome. 
Cells taken from one such mole were used in 
the early 1990s to create a cell line called CHM1. 
Ina Nature paper published on 10 November, 
Eichler and his colleagues describe how they 
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used sections of the CHM1 genome to fill 
about 50 especially troublesome holes in the 
official human genome sequence. They also 
shortened many more gaps, including in genes 
linked to ALS and Fragile X syndrome, a neuro- 
developmental disease with autism-like symp- 
toms (M. J. P. Chaisson et al. Nature http://doi. 
org/w69; 2014). In total, the team mapped 
around 1 million DNA letters that were miss- 
ing in the original reference genome. 

A true platinum sequence will be assem- 
bled from just one genome, however, because 
only then can scientists be sure there are no 
remaining gaps. To this end, a team led by 
Richard Wilson at Washington University in 
St. Louis, Missouri, reported a draft sequence 
of the entire CHM1 genome earlier this month 
(K. M. Steinberg et al. Genome Res. http://doi. 
org/w7b; 2014). Researchers at the firm Pacific 
Biosciences in Menlo Park, California, are sim- 
ilarly working on the whole CHM1 genome, 
but are using sequencers that work with longer 
stretches of uninterrupted DNA, and so pro- 
duce fewer gaps than typical sequencers. The 
firm released a draft genome assembly in Febru- 
ary. The hope is that the method will speed up 
the platinum genome’ arrival. 

“The chances of actually achieving this, for 
one genome, are looking much better’, says 
Deanna Church, a genome scientist at the firm 
Personalis in Menlo Park. Still, Birney says that 
the human reference genome is more about 
“constant improvement” than completion. “For 
sure, somebody’s going to be fiddling around 
with this in 10-20 years’ time.’ m 
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TECHNOLOGY 


Joint effort nabs next wave 
of US supercomputers 


National laboratories collaborate to purchase top-flight machines. 


BY ALEXANDRA WITZE 


nce locked in an arms race with each 
() other for the fastest supercomputers, 

US national laboratories are now 
banding together to buy their next-generation 
machines. 

On 14 November, the Oak Ridge National 
Laboratory (ORNL) in Tennessee and the 
Lawrence Livermore National Laboratory 
in California announced that they will each 
acquire a next-generation IBM supercomputer 
that will run at up to 150 petaflops. This means 
that the machines can perform 150 million bil- 
lion floating-point operations per second, at 
least five times as fast as the current leading US 
supercomputer, the Titan system at the ORNL. 

The new supercomputers, which together 
will cost US$325 million, should enable new 
types of science for thousands of researchers 
who model everything from climate change 
to materials science to nuclear-weapons 
performance. 

“There is a real importance of having the 
larger systems, and not just to do the same 
problems over and over again in greater detail,” 
says Julia White, manager of a grant pro- 
gramme that awards supercomputing time at 
the ORNL and Argonne National Laboratory 
in Illinois. “You can actually take science to 
the next level.” For instance, climate modellers 
could use the faster machines to link together 
ocean and atmospheric-circulation patterns in 
a regional simulation to get a much more accu- 
rate picture of how hurricanes form. 

Building the most powerful supercomputers 
is a never-ending race. Almost as soon as 
one machine is purchased and installed, 
lab managers begin soliciting bids for the 
next one. Vendors such as IBM and Cray 
use these competitions to develop the next 
generation of processor chips and architec- 
tures, which shapes the field of computing 
more generally. 

In the past, the US national labs pursued 
separate paths to these acquisitions. Hoping 
to streamline the process and save money, 
clusters of labs have now joined together to 
put out a shared call — even those that per- 
form classified research, such as Livermore. 
“Our missions differ, but we share a lot of com- 
monalities,’ says Arthur Bland, who heads the 
ORNL computing facility. 


NEXT STOP EXAFLOP 


The speed of the world’s most powerful 
supercomputer has grown more than five orders 
of magnitude in the past two decades. 
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In June, after the first such coordinated 
bid, Cray agreed to supply one machine to a 
consortium from the Los Alamos and Sandia 
national labs in New Mexico, and another to the 
National Energy Research Scientific Comput- 
ing (NERSC) Center at the Lawrence Berkeley 
National Laboratory in Berkeley, California. 
Similarly, the ORNL and Livermore have 
banded together with Argonne. 

The joint bids have been a learning experi- 
ence, says Thuc Hoang, programme manager 
for high-performance supercomputing 
research and operations with the National 
Nuclear Security Administration in Washing- 
ton DC, which manages Los Alamos, Sandia 
and Livermore. “We thought it was worth a try,” 
she says. “It requires a lot of meetings about 
which requirements are coming from which 
labs and where we can make compromises.” 

At the moment, the world’s most powerful 
supercomputer is the 55-petaflop Tianhe-2 
machine at the National Super Computer 
Center in Guangzhou, China. Titan is sec- 
ond, at 27 petaflops. An updated ranking of 
the top 500 supercomputers was announced 
on 18 November at the 2014 Supercomputing 
Conference in New Orleans, Louisiana. 

When the new ORNL and Livermore 
supercomputers come online in 2017, they 
will almost certainly vault to near the top 
of the list, says Barbara Helland, facilities- 
division director of the advanced scientific 
computing research programme at the 
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Department of Energy (DOE) office of 
science in Washington DC. 

The new supercomputers, to be called 
Summit and Sierra, will be structurally similar 
to the existing Titan supercomputer. They will 
combine two types of processor chip: central 
processing units, or CPUs, which handle the 
bulk of everyday calculations; and graphics 
processing units, or GPUs, which generally 
handle three-dimensional computations. 
Combining the two means that a supercom- 
puter can direct the heavy work to GPUs and 
operate more efficiently overall. And because 
the ORNL and Livermore will have similar 
machines, computer managers should be able 
to share lessons learned and ways to improve 
performance, Helland says. 

Still, the DOE wants to preserve a little 
variety. The third lab of the trio, Argonne, 
will be making its announcement in the 
coming months, Helland says, but it will use 
a different architecture from the combined 
CPU-GPU approach. It will almost certainly 
be like Argonne’s current IBM machine, which 
uses a lot of small but identical processors 
networked together. The latter approach 
has been popular for biological simulations, 
Helland says, and so “we want to keep the two 
different paths open” 

Ultimately, the DOE is pushing towards 
supercomputers that could work at the 
exascale, or 1,000 times more powerful than 
the current petascale (see ‘Next stop exaflop). 
Those are expected around 2023. But the more 
power the DOE labs acquire, the more scien- 
tists seem to want, says Katie Antypas, head of 
the NERSC’s services department. 

“There are entire fields that didn’t used to 
have a computational component to them,” such 
as genomics and bioimaging, she says. “And 
now they are coming to us asking for help: m 


CORRECTION 

The News story “Forgotten” NIH smallpox 
virus languishes on death row’ (Nature 
514, 544; 2014) wrongly said that the 
WHO Advisory Committee on Variola Virus 
Research agreed to commission a report 
on the bioterror threat from synthesized 
smallpox — that report was actually 
commissioned before the committee met. 


SOURCE: TOP500.0RG 


KEEPING THE 
LIGHTS ON 


Every year, the US government 
gives research institutions billions 
of dollars towards infrastructure 
and administrative support. A 
Nature investigation reveals who is 
benefiting most. 


BY HEIDI LEDFORD 
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ast year, Stanford University in 

California received US$358 million 

in biomedical-research funding from 

the US National Institutes of Health 
(NIH). Much of that money paid directly for 
the cutting-edge projects that make Stanford 
one of the top winners of NIH grants. But for 
every dollar that Stanford received for science, 
31 cents went to pay for the less sexy side of 
research: about 15 cents for administrative sup- 
port; 7 cents to operate and maintain facilities; 
1 cent for equipment; and 2 cents for libraries, 
among other costs. 

The NIH doled out more than $5.7 billion 
in 2013 to cover these ‘indirect’ costs of 
doing research — about one-quarter of its 
$22.5-billion outlay to institutions around the 
world (see ‘Critical calculations’). That money 
has not been distributed evenly, however: 
research institutions negotiate individual rates 
with government authorities, a practice that is 
meant to compensate for the varying costs of 
doing business in different cities and different 
states. Data obtained by Nature through a Free- 
dom of Information Act request reveal the dis- 
parities in the outcomes of these negotiations: 
the rates range from 20% to 85% at universities, 
and have an even wider spread at hospitals and 
non-profit research institutes. The highest nego- 
tiated rate in 2013, according to the data, was 
103% — for the Boston Biomedical Research 
Institute (BBRI) in Watertown, Massachusetts. 
It went bankrupt and closed the same year. 

Faculty members often chafe at high over- 
heads, because they see them as eating up a por- 
tion of the NIH budget that could be spent on 
research. And lack of transparency about how 
the money is spent can raise suspicions. “Some- 
times faculty feel like they’re at the end of the 
Colorado River,’ says Joel Norris, a climatologist 
at the University of California, San Diego. “And 
all the water’s been diverted before it gets to 
them? 

Nature compared the negotiated rates, as 
provided by the US Department of Health 
and Human Services, to the actual awards 
given to more than 600 hospitals, non-profit 
research institutions and universities listed in 
RePORTER, a public database of NIH funding 
(see ‘Overheads under the microscope’). The 
analysis shows that institutions often receive 
much less than what they have negotiated, 
thanks to numerous restrictions placed on what 
and how much they can claim. Administrators 
say that these conditions make it difficult to 
recoup the cash they spend on infrastructure. 

In addition, new administrative regula- 
tions have meant that universities have had to 
increase their spending, even as federal and state 
funding for research has diminished. “We lose 
money on every piece of research that we do,” 
says Maria Zuber, vice-president for research 
at the Massachusetts Institute of Technology 
(MIT) in Cambridge, which has negotiated a 
rate of 56%. 

But many worry that the negotiation process 


CRITICAL CALCULATIONS 
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What are indirect costs? 


Indirect costs — often called facilities-and- 
administrative costs — are expenses that 
are not directly associated with any one 
research project. This includes libraries, 
electricity, administrative expenses, facilities 
maintenance and building and equipment 
depreciation, among other things. 

The United States began reimbursing 
universities for indirect costs in the 1950s, 
as part of a push to encourage more 
research. An initial cap was set at 8%, but 
that had risen to 20% by 1966, when the 
government began to allow institutions 
to negotiate their rates. Institutions were 
assigned to negotiate with either the US 
Department of Health and Human Services 
or the Office of Naval Research, depending 
on which supplied the bulk of their research 
funding. And the agreed rate holds across 


all federal funders, irrespective of where the 
negotiations took place. 

Acommon misconception is that indirect- 
cost rates are expressed as a percentage of 
the total grant, so a rate of 50% would mean 
that half of the award goes to overheads. 
Instead, they are expressed as a percentage 
of the direct costs to fund the research. So, 

a rate of 50% means that an institution 
receiving $150 million will get $100 
million for the research and $50 million, 
or one-third of the total, for indirect costs. 
But there are multiple caps that lower 

the base amount from which the indirect 
rate is calculated, or that limit the amount 
of money that a research institution can 
request. So very few institutions receive the 
full negotiated rate on the direct funding 
they receive. t.L. 


allows universities to lavish money on new 
buildings and bloated administrations. “The 
current system is perverse,” says Richard 
Vedder, an economist at Ohio University in Ath- 
ens who studies university financing. “There is a 
tendency to promote wasteful spending” 


GLOBAL DISPARITY 

Reimbursement for overheads is dealt with 
differently around the world. The United King- 
dom calculates indirect costs on a per-project 
basis. Japan has a flat rate of 30%. And last year, 
to the dismay of some institutions, the European 
Union announced that it would no longer nego- 
tiate rates and instituted a flat rate of 25% for 
all grant recipients in its Horizon 2020 funding 
programme (see Nature 499, 18-19; 2013). 

The comparatively high overhead 
reimbursement in the United States has gen- 
erated envy, and at times controversy. About 
20 years ago, government auditors found that 
Stanford was using funds for indirect costs to 
cover the depreciation in value of its 22-metre 
yacht moored in San Francisco Bay, and to buy 
decorations for the president’s house, including 
a $1,200 chest of drawers. 

Other universities — including MIT and 
Harvard University in Cambridge — soon 
came forward to correct overhead claims that 
they feared would be perceived as inappropri- 
ate. In the end, Stanford paid the government 
$1.2 million and accepted a large reduction — 
from 70% to 55.5% — in its negotiated rate. But 
the damage was done. The government layered 
on new regulations, including an explicit ban 
on reimbursement for housing and personal liv- 
ing expenses, and a 26% cap on administrative 
costs, although only for universities. 

Two decades later, researchers still worry 
that the system carries the taint of impropriety. 


Administrators say that changes at some 
institutions — such as increased transparency 
about spending and how indirect costs are 
calculated — have allayed faculty concerns. 
But not everywhere. “People often think this is 
about secretarial staff and bloating the mid-level 
research administration,’ says Tobin Smith, 
vice-president for policy at the Association of 
American Universities in Washington DC. “The 
faculty doesn’t often think about all the other 
costs: the lights are on, the heat is on, you're 
using online services the university provides.” 

Despite the high level of scrutiny for 
universities, they did not top the chart for nego- 
tiated rates in the data that Nature collected. Few 
universities have rates above 70%, and they 
would probably face an outcry from faculty if 
they raised rates too high, says Samuel Traina, 
vice-chancellor for research at the University of 
California, Merced. 

No such threshold seems to exist at non- 
profit research institutes: more than one-quarter 
of the 198 institutes for which Nature obtained 
data negotiated rates above 70%. Fourteen of 
them have rates of 90% or higher, meaning that 
their indirect costs come close to equalling their 
direct research funding. According to Robert 
Forrester, an independent consultant in Bel- 
mont, Massachusetts, who helps institutions to 
determine their indirect costs, these institutes 
need to negotiate higher rates because the entire 
facility is dedicated to research, whereas univer- 
sities and hospitals also use facilities for other 
things, such as teaching, that generate funding 
and must share the burden. 

Comparisons of negotiated rates against the 
RePORTER data mined by Nature come with 
caveats. For example, many smaller institutions 
negotiate a provisional rate with the NIH that is 
later adjusted to match actual overhead costs, 
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OVERHEADS UNDER THE MICROSCOPE 


In 2013, the US National Institutes of 
Health (NIH) awarded more than 

US$5 billion to research institutes for 
indirect costs: shared overhead 
expenses such as lighting, heat and 
maintenance. Institutes negotiate the 
rate at which they will be reimbursed, 
and it is expressed as a percentage of 
the direct costs for research in a grant. 
Data obtained by Nature reveal the 
disparity in the outcomes of these 
negotiations and show that the amount 
received is usually much lower than 
that negotiated. 
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Received $3.9 billion, at 
an average rate of 31% 
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Received $611 million, at 
an average rate of 38% 


HOSPITALS 
Received $550 million, at 
an average rate of 38% 


STANFORD UNIVERSITY 
Total funding: $357,812,990 
Negotiated rate: 57% 
Calculated rate: 43% 


BOSTON BIOMEDICAL 
RESEARCH INSTITUTE 
(funding figures from 2012) 


Total funding: $5,802,769 
Negotiated rate: 103% 
Calculated rate: 67% 
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*Institutes can seem to receive higher than their negotiated rates for various reasons. 


Institutions sometimes negotiate higher rates for specific projects, for example. 


The 10 universities that get the most money from the NIH together received more than $1.1 billion towards their 
indirect costs. Their negotiated and calculated rates were slightly higher than the average for all universities. 
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so some grants in REPORTER seem to have a 
reimbursed rate that exceeds the negotiated 
value. A change to the negotiated rate in the 
middle of a year can also cause a disconnect 
between the data Nature obtained and the rates 
given in REPORTER. 

But overall, the data support administrators’ 
assertions that their actual recovery of indirect 
costs often falls well below their negotiated rates. 
Overall, the average negotiated rate is 53%, and 
the average reimbursed rate is 34%. 

The shortfall is largely due to caps imposed 
by the NIH on some grants and expenditures, 
says Tony DeCrappeo, president of the Council 
on Governmental Relations (COGR), an asso- 
ciation in Washington DC that is focused on 
university finance. Some training grants, such 
as ‘K’ awards for early-career investigators, cap 
indirect costs at 8%. The NIH also does not 
award money for conference grants, fellow- 
ships or construction. And it has placed limits 
on specific categories, such as costs associated 
with research using genomic microarrays. 

Such restrictions can make it hard to make 
ends meet, says Eaton Lattman, who heads 
the Hauptman-Woodward Medical Research 
Institute in Buffalo, New York. The institute 
negotiated a rate of 94%, but received just 52%. 
Although it does not incur some of the costly 
administrative burdens of hospitals or universi- 
ties, it still fails to recoup its full investment on 
research, Lattman says. 

The increasing competition for NIH grants is 
amajor factor in that. Because funds for indirect 
costs cannot be used to support researchers who 
lose grants or have yet to win one, Hauptman- 
Woodward must draw from its endowment 
to keep them working until they can support 
themselves. “If you don’t want to kill their 
research career, you have to provide bridge 
funding,’ Lattman says. 

The BBRI faced similar strains. The institute 
was dependent on NIH funding, and could not 
cope when the NIH budget tightened and fac- 
ulty members brought in less grant money (see 
Nature 491, 510; 2012). “The general cost of 
operating the organization did not diminish as 
fast as the direct dollars,” says Charles Emerson, 
former head of the institute and now a devel- 
opmental biologist at the University of Massa- 
chusetts Medical School in Worcester. “So we 
were able to negotiate a higher rate at the end of 
our time there, just to keep the operation going” 

By 2012, the BBRI’s negotiated rate had 
swelled to 103%, the highest for any organi- 
zation in the data provided to Nature. But it 
ended up recouping just 70%, or $2.4 million 
on $3.4 million in direct funding. 

Although non-profit institutes command 
high rates, together they got just $611 million 
of the NIH’s money for indirect costs. The 
higher-learning institutes for which Nature 
obtained data received $3.9 billion, with more 
than $1 billion of that going to just nine institu- 
tions, including Johns Hopkins University in 
Baltimore, Maryland, and Stanford (see “Top 10 


earners’). At 38%, the average rate for these nine 
institutions is about 4% higher than that for all 
institutions with available data. But the range for 
higher-learning institutions was wide, with one 
receiving 62% (York College in Jamaica, New 
York), and one receiving just under 3% (Dillard 
University in New Orleans, Louisiana). 


SHORT CHANGE 

Even if universities did receive the full, negotiated 
rate, it would still be less than the actual costs 
of supporting research, says DeCrappeo. The 
cap on administrative costs that emerged in 
the wake of the Stanford scandal has remained 


“THE RESEARCH 
BUREAUCRACY HAS 
INFLATED WILDLY IN 

UNIVERSITIES AND ITS 
EXPENSIVE.” 


unchanged even though administrative burdens 
have swelled. COGR members maintain that 
their actual costs are about 5% higher than the 
cap, says DeCrappeo. The rest of the money must 
come from other revenue, such as tuition fees, 
donations and endowments. 

The best solution, according to Barry 
Bozeman, who studies technology policy at 
Arizona State University in Phoenix, is not to 
raise the cap, but to cut costs by getting rid of 
administrative rules and regulations that are 
simply wasting time and money. “The research 
bureaucracy has inflated wildly in universities 
and it is expensive.’ That inflation, he says, is 
evident in grant applications. Thirty years ago, 
administrative requirements associated with 
grants were relatively low. “Nowadays, the actual 
content of the proposal — what people are going 
to do and why it’s important — is always a small 
fraction of what they submit,’ he says. 

As an illustration of the growing bureaucracy, 
DeCrappeo says that when the COGR began to 
keep a guide to regulatory requirements for its 
members in 1989, the document was 20 pages 
long. Now it is 127 pages. And Bozeman says 
that he has to fill out forms relating to the care of 
laboratory animals when he applies for grants, 
even though he has never used animals. 

The regulatory burden can be particularly 
high for medical schools, which must adhere to 
regulations for human-subject research, privacy 
protection and financial conflicts of interest, 
among others. The Association of American 
Medical Colleges in Washington DC says that 
70 of its members have spent $22.6 million 
implementing conflicts-of-interest reporting 
guidelines that came into effect this year. 

Other funders place strict limits on their 
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reimbursements. The US Department of 
Agriculture, for example, caps many of its 
reimbursements at 30%. Many philanthropic 
organizations do not reimburse for overheads 
at all, and those that do often pay less than the 
government rate (see Nature 504, 343; 2013). As 
a result, some institutions are reluctant to allow 
researchers to apply for such grants — provid- 
ing another source of friction between faculty 
members and the administration. 

Tight budgets and fierce competition for 
federal grants mean that faculty members are 
keenly sensitive to anything that might affect 
how much money they receive, says Lattman. 
Recipients of grants from the National Science 
Foundation (NSF) are particularly rankled, he 
says, because the NSF allocates money for indi- 
rect costs — at the federal negotiated rate — from 
the total grant awarded. In other words, research- 
ers told that they will receive a $1-million NSF 
grant might see only 60% of the money. The NIH, 
by contrast, typically gives faculty members the 
full $1 million and then reimburses indirect costs 
ina separate payment to the university. 

Even so, would-be NIH grant recipients often 
fear that a high indirect-cost rate at their insti- 
tution will hurt their chances of getting a grant 
funded, despite the lack of evidence supporting 
any such trend. Others are troubled by the lack 
of transparency at many institutions as to how 
the indirect costs are calculated and the funds 
distributed. Because indirect-cost revenue is 
considered a reimbursement for money the 
university has already spent, much of the cash 
received from the government disappears into 
auniversity’s general fund. “Faculty have always 
been somewhat in the dark,” says Edward Yelin, 
who studies health policy at the University of 
California, San Francisco. 

Although the payout for indirect costs is high, 
officials at the NIH say that the proportion of 
the NIH budget dedicated to overheads has held 
steady for more than two decades. When a 2013 
report by the US Government Accountability 
Office warned that indirect costs could begin 
to eat up an increasing proportion of the NIH’s 
research budget, the NIH countered that this 
was unlikely. 

DeCrappeo is hopeful that regulations due to 
come into effect in December will rein in the 
proliferation of caps on indirect cost rates. The 
regulations will require officers at agencies such 
as the NIH to have any new caps on overhead 
reimbursement approved by the head of the 
agency and provide a public justification for the 
change. DeCrappeo says that this could lead to 
a more transparent process. 

And for those who fret about where this 
money is going, DeCrappeo urges them to look 
beyond their own research programmes. “If all 
you're concerned about is the direct costs, it 
wont take long for your facilities to deteriorate, 
he says. “You can't do research on the quad” m 


Heidi Ledford writes for Nature from 
Cambridge, Massachusetts. 
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FAR-FLUNG 
PHYSICS 


The International Centre for Theoretical Physics 
was set up to seed science in the developing world; 
100,000 researchers later, it is still growing. 


he dust in Kathmandu cloaks 

everything. It carpets the streets 

with a dingy layer. Women cutting 

waist-high grass are wearing face 

masks to keep it out. And it set- 
tles on the dilapidated buildings of Tribhuvan 
University (TU) — the biggest scientific estab- 
lishment in Nepal. 

Narayan Adhikari, however, has managed to 
stay clean. Clad in an impeccable white shirt and 
black trousers, he adds his motorbike to a col- 
lection of some 20 others parked haphazardly 
in front of a 3-storey building, the university's 
physics department. Before entering his tiny lab, 
the 44-year-old researcher removes his shoes to 
keep the dirt out. In the lab are a dozen desktop 
computers, which the department received in 
2009 — before that, there were none. Power 
blackouts happen every day, lasting for up to 
16 hours, and the Internet connection works 
“maybe one day a month’, Adhikari says. 

Despite this, for the past eight years Adhikari 
and his students have been producing a stream 
of theoretical-physics papers on the properties 
of materials such as atom-thick graphene. It is a 
rare — if not unique — achievement for a phys- 
ics lab in Nepal, and Adhikari’s contributions 
are also helping to build up his department as 
a whole, by boosting the number of PhD stu- 
dents being trained there. “Doing physics in a 
country like Nepal is a real challenge,” he says. 


BY KATIA MOSKVITCH 


Adhikari’s accomplishments are rooted in 
more than his own determination and wit; 
they also draw on support from the Inter- 
national Centre for Theoretical Physics (ICTP), 
an organization based a world away in the 
picturesque Italian seaside town of Trieste. Set 
up in 1964 by Pakistani physics Nobel laure- 
ate Abdus Salam and Italian physicist Paolo 
Budinich, it aims to advance theoretical phys- 
ics in the developing world. Salam, who died 
in 1996, wanted the centre to be “a home away 
from home’ for researchers from the poorest 
regions of the world. After they passed through 
the ICTP’s programmes of training and 
research, he hoped that alumni would establish 
scientific communities in their home countries, 
rather than settling abroad as so many scien- 
tists did. Adhikari, who completed the ICTP’s 
one-year postgraduate-diploma programme 
in 1998, is one of the institute’s success stories. 


GLOBAL REACH 

Adhikari is hardly the only one. In the 50 years 
since it was established, the ICTP has trained 
more than 100,000 scientists from 188 coun- 
tries through its workshops and courses. 
Researchers who studied there have contrib- 
uted to major discoveries in fields ranging 
from string theory and neutrino physics to 
climate change, and have racked up a trophy 
cabinet of academic prizes, including shares 
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in a pair of Nobels. Most physicists credit the 
institute with stemming the brain drain and 
bolstering academia in the developing world. 
The institute is “widely admired”, says Mar- 
tin Rees, an astrophysicist at the University 
of Cambridge, UK, and former head of the 
Royal Society in London, who hopes that it 
will “inspire the creation of similar institutions 
covering other scientific fields”. 

The ICTP has evolved over time. What 
started out as a small project focused narrowly 
on Salam’s discipline — high-energy physics 
— has morphed into a broader programme. 
In 1998, the institute expanded its brief to 
include mathematics and Earth-systems 
physics, including climate and geophysics, 
and in 2014 it added quantitative life sciences. 
The institute is still changing. In the past two 
years it has opened satellite campuses in Brazil, 
Mexico and Turkey, and it is currently estab- 
lishing branches in Rwanda and China. Plans 
to expand into more countries and disciplines 
are being considered. 

But some worry about the organization's 
future. The main provider of the ICTP’s fund- 
ing, the Italian government, has started to 
baulk at shouldering most of its costs, and 
some scientists are concerned that expand- 
ing could dilute the quality of ICTP-fuelled 
research. “In the last few years ICTP has 
started many new things,’ says Chris Llewellyn 


CENTRAL DEPT OF PHYSICS, TRIBHUVAN UNIV. 


Tribhuvan University in Kathmandu has built up its physics department with support from the International Centre for Theoretical Physics. 


Smith, a theoretical physicist at the University 
of Oxford, UK, and former head of CERN, 
Europe's particle physics laboratory near 
Geneva, Switzerland. “If they try to take on 
even more and be too ambitious with new 
ideas, they might let go of what they've got.” 


CURIOUS CHILD 

Adhikari could be a poster child for the ICTP. 
The youngest of six siblings, he was born to 
farming parents in a village near Nepal’s 
second-largest city, Pokhara, and grew up 
with paraffin-oil lamps and no running water 
at home. His father was literate, his mother was 
not — but both parents supported his desire 
to study. “I am very curious to unearth the 
secrets of nature — so I love physics,’ he says. 
He worked as a teacher for three years to earn 
enough money to study at TU. 

In 1996, having completed his under- 
graduate and master’s degrees in physics, 
Adhikari won a place on the ICTP’s diploma 
programme. When he travelled to Trieste, 
aged 27, he felt as if he had landed on a differ- 
ent planet. “I was astonished by the Western 
world — there was no dust in the air!” he says. 
Adhikari met Nobel laureates and other distin- 
guished physicists, who come to the ICTP to 
collaborate and teach. 

After finishing the diploma, he did a 
PhD at the Martin Luther University of 


Halle-Wittenberg in Germany, simulating the 
behaviour of polymers and other materials. This 
was followed by postdocs in the United States 
and Germany. “Our life was good, and there 
was clean drinking water; says Adhikari’s wife, 
Sabitra. “But one day Narayan told me: “We have 
to go back” Adhikari had always felt strongly 
that he wanted to use his knowledge “to make 
Nepal a better place’; he says — and this aim 
was reinforced during his diploma at the ICTP. 

When Adhikari rejoined TU in 2006, he set 
about building his own research group. He had 
no problem finding willing students; what he 
did not have was books, the Internet, a good 
electricity supply or any equipment. That ruled 
out experimental physics, but it allowed him 
to continue his theoretical work, which he did 
by buying a suite of desktop computers with 
funding from the ICTP. 

Soon Adhikari was publishing his studies, 
which modelled the properties of materials 
ranging from water to polymers and solids such 
as graphene. In the past two years, for example, 
he has explored’” how graphene might be used 
to store energy by decorating it with metal — 
a study that he estimates took three times as 
long as it would have in the West, because of the 
power cuts that routinely stopped work. “The 
conditions were so difficult that sometimes I 
was afraid that I'd never achieve anything in 
Kathmandu,’ he says. “But I just kept thinking 
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that I had to continue, because itd be great to 
develop science in Nepal.” At the time, few 
scientists at TU were publishing consistently in 
international journals, but Adhikari’s enthusi- 
asm seeped into the rest of his department. In 
the 40 years before 2006, just 4 students had 
completed a PhD there; ambitious graduates 
usually went to Europe or the United States. 
Since Adhikari joined, 22 students have been 
admitted to the PhD programme and other 
researchers have published more, too. “What 
he has helped us to achieve is really remark- 
able,” says Binil Aryal, head of physics at TU. 


THE GREATER GOOD 

But does Nepal need a theoretical-physics 
department? After all, the country has more 
urgent issues: its population struggles with 
malnutrition, its infrastructure is falling apart, 
and its air quality ranks among the worst in the 
world. “In developing countries like Nepal, the 
government does not allocate sufficient budget 
for R&D because of much more pressing prob- 
lems and priorities,’ says Ganesh Shah, Nepal's 
science minister from 2008 to 2009. 

Shah and Adhikari say that building up the 
intellectual capacity of the country will drive 
its economic development. “Investment in 
science, technology and innovation is required 
to create jobs and reduce poverty and improve 
the living standards of the people,’ says Shah. 
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When he was science minister, he tried to allo- 
cate more funding for basic research, he says 
— but with limited success. The Nepali gov- 
ernment invested 0.3% of its gross domestic 
product in research and development in 2010, 
similar to that of other developing countries 
in south Asia but well below the nearly 2% 
invested by China. Theoretical physics is a lot 
easier and cheaper to set up than some other 
fields, Shah points out. 

Adhikari is paid by the university, but he still 
receives some support from the ICTP. Until 
this year, his students had to fly to comput- 
ing facilities in Kolkata, India, every time they 
had a complex computation to perform. Not 
anymore. Gopi Kaphle, one of Adhikari’s PhD 
students, proudly shows off a shoebox-sized 
computer. “It performs computations about 
ten times faster than the machines we used to 
have,” says Kaphle. Because calculations on the 
new computer must run without interruption, 
the ICTP also funded a solar panel on the roof of 
the department, to deal with Nepal's power cuts. 

This year, Adhikari decided that he wanted 
to expand into relatively simple, tabletop 
experiments in nanoscale materials. “We 
have to be able to do experiments; it’s the next 
step forward, he says. To try to negotiate the 
funds, he returned to the ICTP. He arrived at 
the headquarters in Trieste in late September, 
just as the centre was getting ready to celebrate 
its 50th birthday. 


BREAKING DOWN BARRIERS 

The seeds of the ICTP were planted after the 
Second World War, when physicists includ- 
ing Albert Einstein, Robert Oppenheimer and 
Niels Bohr championed the concept ofa United 
Nations-backed centre to promote peaceful 
nuclear-physics research. Initially, this led to 
the creation of the International Atomic Energy 
Agency (IAEA). But for Abdus Salam, a science 
prodigy from Pakistan who had been made a 
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Narayan Adhikari (centre, in pale blue shirt and black trousers) with students from the physics department at Tribhuvan University. 


physics professor at Imperial College London 
by the age of 31, that was not enough. 
Speaking to the IAEA’s General Conference 
in 1960, he outlined his idea for an IAEA- 
backed organization that would promote 
theoretical-physics research in the developing 
world and bridge East and West in the cold war. 
In the audience was Paolo Budinich, head of 
physics at the University of Trieste, who shared 
the dream. The two men initially encountered 
resistance to the idea of building a new cen- 
tre; critics argued that it would be easier and 
cheaper for developing-world physicists to visit 
existing labs in the developed world. But Salam 
and Budinich won the argument, not least after 
they secured the financial backing of the Ital- 
ian government and the support of the [AEA 
and the United Nations Educational, Scientific 
and Cultural Organization (UNESCO). They 
chose to locate the centre in Trieste, which was 
politically symbolic because it sat right next to 
the Iron Curtain that divided East and West. 
When the institute opened in 1964, it rapidly 
established itself as a place for high-level 
research and training, welcoming scientists 
from both sides of the Iron Curtain and from 
farther afield. The centre, which initially offered 
scientists a two-to-three-month grant to work 
in Trieste, “was like a source of oxygen to Third 
World scientists’, says Abdelkrim Aoudia, a geo- 
physicist from Algeria who works at the ICTP. 
Even in the institute's early days, many Nobel 
laureates served as visiting professors. When, 
in 1979, Salam shared a Nobel prize with 
Sheldon Glashow and Steven Weinberg for 
the unification of electromagnetism and the 
weak nuclear force, the organization's pres- 
tige skyrocketed. Speaking at the anniversary 
celebrations, Salam’s son Ahmad, an invest- 
ment banker at EME Capital in London, wiped 
away tears as he remembered the sacrifices his 
father made while he set up the centre — not 
least spending little time with his children. 
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“He had a much bigger mission in life,” said 
Ahmad. 

Today, around 2,500 developing-world 
scientists visit the ICTP each year. About 50 of 
these enrol in the one-year diploma, an intense 
predoctoral education programme taught by 
experts from around the world. (The institute 
identifies students through both an application 
process and the recommendations of research- 
ers and teachers.) Many of the rest — including 
Adhikari — are part of the Associates Scheme, 
which supports scientists from developing 
countries to make regular visits to the ICTP, 
where they network and update their skills. 
What makes the institute successful, say those 
involved, is its focus on nurturing talented 
scientists and keeping them connected to the 
international community, while encouraging 
them to continue research at home. 


BRAIN GAIN 

That approach is working, says Fernando 
Quevedo, the ICTP’s director. Three-quarters 
of the students who have completed the 
diploma programme have received PhDs, or 
are working towards them, and more than half 
of those who complete PhDs go back to their 
home countries (see ‘Sticking with science’). 
More than 90% of associates remain in their 
home countries for their careers. Some, inevi- 
tably, do end up abroad, but even in those 
cases, the ICTP often claims success. One of 
the world’s leading string theorists, Argentin- 
ian Juan Maldacena, who worksat the Institute 
for Advanced Study in Princeton, New Jersey, 
attributes his achievements in part to the ICTP, 
because of the training that he and his master’s 
supervisor received at the centre. 

The ICTP’s journey has not been entirely 
smooth, however. “When Salam passed 
away, ICTP had a period to recover from the 
founder's death, but they managed,” says David 
Gross, a string theorist at the University of 
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California, Santa Barbara, who often visits the 
institute. Keeping the money flowing has been 
difficult — especially in light of the institute's 
growth into new fields. 

The satellite campuses that it has been 
launching, mostly supported by the host 
countries, are designed to improve postgrad- 
uate education in physics and mathematics, 
as well as to conduct research and training 
in topics that serve regional interests and 
strengths. The centre in Sao Paulo, Brazil, for 
instance, focuses on pure theory, whereas the 
one in Chiapas, Mexico, includes climate and 
renewable energy. When it comes to further 
expansion, Quevedo says, the institute insists 
on quality over quantity and is careful to evalu- 
ate each proposal. It has also made it a priority 
to recruit more women into its programmes. 
Since 2001, the average proportion of female 
scientists visiting or studying on its campus 
has been 20%, but the balance is better in the 
2013-14 diploma programme, in which half of 
the participants are women. 

Allof these activities take money. The Italian 
government still covers about 80% of the Tri- 
este centre’s annual budget of about €30 mil- 
lion (US$37 million), with a major chunk of 
the rest provided by the IAEA and UNESCO. 
(UNESCO has also had responsibility for the 
centre’s administration since 1996.) “Italy 
deserves a lot of credit for sticking with the 
organization over the years through all their 
financial crises,’ says Gross. But the govern- 
ment is keen for the ICTP to find new funding 
sources, and in 2013 the institute created an 
office dedicated to seeking additional funding 
from elsewhere. With many applications for 
every available training slot, “the main chal- 
lenge is to attract funds to be able to fund more 
students’, says Quevedo. 

The centre has also had to adapt to 


Nobel-prizewinning physicist Abdus Salam campaigned for a centre to support developing-world physics. 


STICKING 
WITH SCIENCE 


Most people who get diplomas from the 
International Centre for Theoretical Physics 
(ICTP) pursue further study, and more 
than half who get PhDs return to their 
home countries. 


425 (59%) 
Received 
PhDs 


117 (16%) 
Working 
on PhDs 


720 


Completed ICTP 


postgraduate 
diploma since 21 (3%) 
1991 Received 


master's 
16 (2%) 
Master’s 
students 


Unknown 
Either lost contact or are in 
the process of applying for 
further study. 


geopolitical changes. Back at the start, when it 
was important to bridge the East-West divide, 
the institute offered neutral ground for Soviet 
and US physicists. Today the bridges are built 
between developed countries in the global north 
and more impoverished or politically isolated 
ones in Africa, South America and south Asia. 
The institute is one of very few places to have 
helped scientists from North Korea to meet and 
study with other researchers, for example, says 
ICTP cosmologist Paolo Creminelli. “These 
researchers represent a connection between 
North Korea and the rest of the world” 
Elsewhere, several other institutions have 
been built on the ICTP model, including 
the International Centre of Physics (CIF) in 
Bogota, which since its establishment in 1985 
has supported physics research in Colombia 
and surrounding countries. There is a great 
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need for ICTP-type programmes in natural 
sciences, engineering and other technical sci- 
ences, says Torsten Wiesel, president emeritus 
of Rockefeller University in New York City, 
who has worked to advance developing-world 
science.“The world needs more programmes 
reaching out over the borders into countries 
of need,” he says. 

Some researchers argue that the ICTP itself 
should go further. It should “develop research 
schemes and programmes with direct, spe- 
cific and relevant applications in engineer- 
ing, industry and medicine in the developing 
world’, says Estelle Maeva Inack, a condensed- 
matter physicist from Cameroon who works 
at the ICTP. Quevedo says that the institute 
is aware of this need, and that it is one of the 
reasons for expanding into more applied 
disciplines. He also points to a popular course 
on entrepreneurship for physicists, which the 
ICTP runs in collaboration with partner insti- 
tutes around the world. “But our main mission 
is to promote excellence in science in develop- 
ing countries and we should continue being 
faithful to this mandate,” he says. 

That is what got it this far, after all. “The first 
challenge of every institution is survival, says 
Quevedo, “and ICTP has survived for 50 years.” 


HEADING HOME 

The anniversary celebrations over, Adhikari 
talks to his students by phone as he gets ready 
to leave Trieste. It has been raining a lot in 
Nepal, which has rendered the solar panels 
rather useless — and has made work hard for 
Kaphle, who is getting ready to defend his PhD 
thesis in a few weeks. 

But Adhikari is not put out. His proposal 
for tabletop physics went down well, and now 
discussions are under way at the ICTP to see 
whether he can receive the funds he would 
like. “I owe a lot to the organization,’ he says, 
and he is optimistic that science will appeal to 
other bright students in Nepal. He wants to see 
children in villages doing homework on com- 
puters, illuminated by electric lights, rather 
than the oil lamps that he once used. “I hope 
one day our students in Nepal will be able to 
find answers to some really big problems in 
physics.” 

And there is no reason why they shouldn't, 
says Gross, with a worldwide pool of talent 
just waiting to be tapped. “There are brains 
everywhere, in roughly the same propor- 
tion of the population — as long as they get 
achance.” m 


Katia Moskvitch is a science writer in London 
and an International Development Research 
Centre fellow at Nature. 
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A woman in Jharkhand, India, burns raw coal into charcoal, which emits toxic gases that harm her health and affect the climate. 


Clean up our skies 


Improve air quality and mitigate climate-change simultaneously, 
urge Julia Schmale and colleagues. 


fall on climate-change negotiations 

at the 20th United Nations Frame- 
work Convention on Climate Change 
(UNFCCC) Conference of the Parties in 
Lima, Peru. The emphasis will be on reduc- 
ing emissions of long-term atmospheric 
drivers such as carbon dioxide, the effects 
of which will be felt for centuries. At the 
same time, the mitigation of short-lived 
climate-forcing pollutants (SLCPs) such as 
methane, black carbon and ozone — which 
are active for days or decades — must be 
addressed (see ‘Compounds of concern). 


I n December, the world’s attention will 


SLCPs cause poor air quality and are 
responsible for respiratory and cardiovascu- 
lar diseases. Particulate matter in the atmos- 
phere is the leading environmental cause of 
ill health, and air pollution is causing about 
7 million premature deaths annually’. Inter- 
actions between warming, air pollution and 
the urban heat-island effect (which causes 
cities to be markedly warmer than their 
surrounding rural areas) will raise health 
burdens for cities worldwide by mid-century’. 
Air pollution also damages ecosystems and 
agriculture. 

Current air-quality legislation falls short. 


Existing measures would prevent just 
2 million premature deaths by 2040. We 
estimate that around 40 million more such 
deaths would be avoided if concentrations 
of methane, black carbon and other air pol- 
lutants were halved worldwide by 2030 (see 
‘Clean air’). 

This is not an ‘either-or’ decision: 
coordinated action on both climate change 
and air pollution is necessary. And it is trac- 
table: for example, electric-car sharing or 
shifting from fossil fuels to renewable power 
generation would reduce consumption and 
overall emissions and lead to behavioural 
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> shifts that are beneficial in both the near 
and long term’. 

But defining joint CO, and SLCP reduc- 
tion goals is difficult. Researchers need to 
spell out the benefits and trade-offs of sepa- 
rate and joint air-pollution and climate- 
change mitigation in terms of public health, 
ecosystem protection, climate change and 
costs. A suite of mitigation policies must be 
designed and applied on all scales — from 
cities to the global arena. 


DOUBLE JEOPARDY 

Studies** estimate that rigorous reductions 
of global methane and black-carbon-related 
emissions by 2030 could prevent around 
2.4 million premature deaths per year that 
result from air pollution, and save 50 mil- 
lion tonnes of crops through avoided ozone 
damage (methane is a precursor for ozone 
production). Global mean temperature rise 
would be slowed by about 0.5 °C by mid- 
century. The rate of sea-level rise would be 
reduced by 20% in the first half of this cen- 
tury by such measures alone, and by 50% in 
the second half if CO, and SLCP mitigation 
are combined’. 

Lower air pollution also has societal 
benefits. Methane captured from landfills or 
manure can be used to run residential stoves, 
for example. In developing countries, replac- 
ing conventional cooking stoves with clean- 
burning technologies allows people — women 
and children, in particular — to invest time 
in education or financially rewarding work, 
rather than spending time collecting wood or 
other materials for basic family needs’. 


COMPOUNDS OF CONCERN 


All SLCPs must be reduced in con- 
cert. Sulphate aerosols cool the climate, as 
happens following volcanic eruptions. But 
delaying sulphur dioxide mitigation as a way to 
temporarily mask global warming is prob- 
lematic. Greater stresses on people's health 
and the environment already result from 

today’s enhanced par- 


“Energy ticulate concentrations 
ministries and acidified rain. 

tend to Coordinated action 
focus onCO, to mitigate SLCPs 
reductionsand and CO, is ham- 
environment pered by fragmented 
ministries policies. For exam- 
manage air ple, energy minis- 
quality. ” tries tend to focus 


on CO, reductions 
and environment ministries manage air 
quality. Greenhouse gases are subject to 
global agreements, whereas air pollut- 
ants are more usually limited locally by 
legislation. Regulation of different climate- 
forcing compounds is patchy. 
Anthropogenic emissions of methane are 
predicted to increase by about 25% (more 
than 70 million tonnes annually) by 2030%, yet 
the gas is hardly regulated. Methane is cov- 
ered by the Kyoto Protocol, but most coun- 
tries’ controls focus on CO,,. In the European 
Union (EU), for example, methane is not cov- 
ered by the national emissions ceiling direc- 
tive, the directive on ambient air quality or 
the EU Emissions Trading System. The EU’s 
industrial emissions directive omits major 
sources of the gas, such as cattle farming. 
Air-quality policies in the EU and 


the United States have been partially 
successful in reducing periods of extreme 
ozone concentration. But average regional 
concentrations have not declined in the 
past two decades across Europe, and there 
is still no legally binding limit, only a target. 
Trends in the United States are mixed and 
vary seasonally; in east Asia, surface ozone 
is increasing. 

For black carbon, there are almost no 
regulatory obligations to report emissions 
or measure ambient concentrations. Few 
regional and local assessments have been 
made. Little change in global black carbon 
emissions is predicted by 2030, because 
reductions in North America, Europe and 
northeast and southeast Asia and the Pacific 
will be offset by increases in south, west and 
central Asia and in Africa’, 

Unlinked and narrow air pollution and 
climate-policy interventions can have mixed 
results on both fronts. In the EU, for exam- 
ple, legislated vehicle-emissions limits have 
reduced particulate concentrations by 45% 
between 1995 and 2008 and are projected to 
reduce black carbon by more than 90% by 
2025 compared with 2000. Yet CO, emis- 
sions from the ever-growing transport sec- 
tor are rising. And air quality is not under 
control. Unregulated residential emissions 
from biomass heating are rising, and will 
account for 80% of black-carbon emissions 
in Europe in 2025. 

Also problematic are lax targets. For 
example, the annual EU limit for particu- 
late matter smaller than 2.5 micrometres 
(PM, ;) that will be binding by 2015 is 


Common air pollutants and industrial chemicals have major influences on the climate, human health and agriculture 
even though they persist for only a short time in the atmosphere. 
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CLEAN AIR 


More than 40 million deaths from respiratory and cardiovascular diseases could be prevented by 2030 by halving the concentration of short-lived climate-forcing 
pollutants (SLCPs) in the atmosphere immediately (a). Joint approaches to mitigating SLCPs and carbon dioxide are more effective than separate measures in 


limiting global average temperature rise* (b). 


= Strong and immediate SLCP reduction 


2010 2020 2030 


2.5 times higher than that recommended 
by the World Health Organization (WHO). 
And the current PM,, (particulates smaller 
than 10 micrometres) limit is twice that rec- 
ommended by the WHO. If the EU meets its 
limit on PM)», no further action to meet the 
legal requirements will be needed, because 
the PM, , value will also be met. 

Some coordinated efforts to reduce air 
pollution and slow climate change have 
begun. The Climate and Clean Air Coali- 
tion to Reduce Short-Lived Climate Pollut- 
ants (CCAC), formed in 2012, now includes 
42 nations, the European Commission 
and more than 50 organizations. It focuses 
on mitigating methane and black-carbon 
emissions for transport, brick, oil and nat- 
ural-gas production, household cooking 
and heating. Since 2009, the Arctic Council 
runs task forces to reduce black-carbon and 
methane emissions to slow climate change 
in the region, and has produced two reports 
in addition to a scientific assessment of black 
carbon in the Arctic. But so far, only Nor- 
way has developed a national action plan to 
reduce SLCPs. 

None of these efforts addresses structural 
and behavioural changes. Coordinated 
action to reduce SLCPs and CO, simul- 
taneously is not an objective, because it is 
assumed that parallel reductions will happen 
under different policy umbrellas. 


DOUBLE DUTY 
Effective mitigation of SLCPs will require 
detailed assessments of the multiple impacts 
of emitted air pollutants together with CO,, 
their sources, their atmospheric interactions 
and their potential for mitigation’. 
Combined efforts at the city and state 
level will be particularly important because 
this is where most people are exposed to air 
pollution, and 75% of global CO, emissions 
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is generated in cities. Positions and task 
forces should be created to promote joint 
emissions-reduction strategies across 
municipal and regional departments. For 
example, climate policies that encourage 
combined heat and power plants with low 
power capacities for cities — thus poten- 
tially exempting them from air-quality 
regulations’ — should be avoided. 

Scaling up and coordinating local efforts 
and national strategies are necessary. For 
example, local efforts in the Arctic can be 
only partly effective because the region is 
subject to imported pollution from the resi- 
dential and transport sectors of countries at 
lower latitudes. 

Global organizations such as the CCAC, 
the World Meteorological Organization 

and the WHO could 


“Unlinked and — assume coordinating 
narrow air roles. Arctic Council 
pollution and member states should 
climate-policy  takea leadership role 
interventions in national actions 
can have to reduce black car- 
mixed results bon and methane at 
on both their next ministerial 
fronts.” meeting in 2015. The 


European Commis- 
sion should propose ambitious emissions 
limits for methane to the national emissions 
ceiling directive. 

It is important that steps to limit SLCPs do 
not distract from CO, mitigation, and vice 
versa. We calculate, building on work* by 
D.S. and colleagues, that a delay of 20 years 
in reducing CO, emissions would result 
in 0.4°C more warming by the end of the 
century than if measures were put in place 
immediately, with the result that the 2°C 
temperature mark would be crossed in the 
mid-2060s rather than just after 2100 (see 
‘Clean air’). 


2040 2050 2060 2070 


The 2015 Conference of the Parties meet- 
ing in Paris needs to pursue its primary mis- 
sion to reduce CO, for the climate’s sake. That 
said, the scientific community must speak 
out against recommendations — explicit or 
implicit”’° — to exclude SLCPs from discus- 
sions of climate-change mitigation or to delay 
their reduction. Tens of millions of lives are 
at stake, along with damage to agriculture, 
ecosystems and cultural heritage. m 
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Erika von Schneidemesser is a project 
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Local effects such as thunderstorms, crucial for predicting global warming, could be simulated by fine-scale global climate models. 


Build high-resolution 
global climate models 


International supercomputing centres dedicated to climate prediction 
are needed to reduce uncertainties in global warming, says Tim Palmer. 


r | the drive to decarbonize the global 
economy is usually justified by appeal- 
ing to the precautionary principle: 

reducing emissions is warranted because the 

risk of doing nothing is unacceptably high. 

By emphasizing the idea of risk, this framing 

recognizes uncertainty in the magnitude and 

timing of global warming. 

This uncertainty is substantial. If warming 
occurs at the upper end of the range projected 
in the Intergovernmental Panel on Climate 
Change (IPCC) Fifth Assessment Report’, 
then unmitigated climate change will prob- 
ably prove disastrous worldwide, and rapid 
global decarbonization is paramount. If 
warming occurs at the lower end of this range, 
then decarbonization could proceed more 
slowly and some societies’ resources may be 
better focused on local adaptation measures. 

Reducing these uncertainties substantially 
will take a new generation of global climate 
simulators capable of resolving finer details, 
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including cloud systems and ocean eddies. 
The technical challenges will be great, requir- 
ing dedicated supercomputers faster than the 
best today. Greater international collabora- 
tion will be needed to pool skills and funds. 

Against the cost of mitigating climate 
change — conceivably trillions of dollars 
— investing, say, one quarter of the cost of 
the Large Hadron Collider (whose annual 
budget is just under US$1 billion) to reduce 
uncertainty in climate-change projections is 
surely warranted. Such an investment will also 
improve regional estimates of climate change 
— needed for adaptation strategies — and our 
ability to forecast extreme weather. 


GRAND CHALLENGES 

The greatest uncertainty in climate projec- 
tions is the role of the water cycle — cloud 
formation in particular — in amplifying or 
damping the warming effect of CO, in the 
atmosphere’. Clouds are influenced strongly 
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by two types of circulation in the atmos- 
phere: mid-latitude, low-pressure weather 
systems that transport heat from the tropics 
to the poles; and convection, which conveys 
heat and moisture vertically. 

Global climate simulators calculate the 
evolution of variables such as temperature, 
humidity, wind and ocean currents over a 
grid of cells. The horizontal size of cells in 
current global climate models is roughly 
100 kilometres. This resolution is fine 
enough to simulate mid-latitude weather 
systems, which stretch for thousands of kilo- 
metres. But it is insufficiently fine to describe 
convective cloud systems that rarely extend 
beyond a few tens of kilometres. 

Simplified formulae known as ‘param- 
eterizations are used to approximate the 
average effects of convective clouds or 
other small-scale processes within a cell. 
These approximations are the main source 
of errors and uncertainties in climate 
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simulations’. As such, many of the param- 
eters used in these formulae are impossible 
to determine precisely from observations 
of the real world. This matters, because 
simulations of climate change are very sen- 
sitive to some of the parameters associated 
with these approximate representations of 
convective cloud systems’. 

Decreasing the size of grid cells to 1 kilo- 
metre or less would allow major convective 
cloud systems to be resolved. It would also 
allow crucial components of the oceans to be 
modelled more directly. For example, ocean 
eddies, which are important for maintaining 
the strength of larger-scale currents such as 
the Gulf Stream and the Antarctic Circum- 
polar Current, would be resolved. 

The goal of creating a global simulator 
with kilometre resolution was mooted at a 
climate-modelling summit in 2009°. But no 
institute has had the resources to pursue it. 
And, in any case, current computers are not 
up to the task. Modelling efforts have instead 
focused on developing better representations 
of ice sheets and biological and chemical pro- 
cesses (needed, for example, to represent the 
carbon cycle) as well as quantifying climate 
uncertainties by running simulators multiple 
times with a range of parameter values. 

Running a climate simulator with 1-kilo- 
metre cells over a timescale of a century will 
require ‘exascale computers capable of han- 
dling more than 10’ calculations per second. 
Such computers should become available 
within the present decade, but may not 
become affordable for individual institutes 
for another decade or more. 


CLIMATE FACILITIES 

The number of low-resolution climate simu- 
lators has grown: 22 global models contrib- 
uted to the IPCC Fourth Assessment Report 
in 2007; 59 to the Fifth Assessment Report 
in 2014. European climate institutes alone 


contributed 19 different climate model inte- 
grations to the Fifth Assessment database (go. 
nature.com/3gu8co). Meanwhile, systematic 
biases and errors in climate models have been 
only modestly reduced in the past ten years®. 


It is time to establish a small number of 


international climate-prediction facilities>”, 
in which climate institutes, weather- forecast 
centres and academic departments can com- 
bine resources and talents to create the first 
cloud-resolved global climate simulators 
within a decade. Focusing on fewer simula- 
tors, perhaps one per continent, would avoid 
duplication and concentrate the large num- 
ber of individually poorly resourced efforts, 
yet maintain a competitive environment to 
encourage scientific innovation. 

The success of the European Centre for 
Medium-Range Weather Forecasts, an inter- 
governmental effort, isa good example. The 
centre was set up in the 1970s to produce 
weather forecasts up to ten days ahead using 
a global weather model. From the beginning, 
its forecasts have been the envy of the world. 
Funding from the centre's 34 member states 
enables human talent to be drawn from 
across Europe with jointly funded super- 
computing infrastructure. 

This concept now needs to be applied to 
climate prediction. A budget of a few hun- 
dred million euros a year from European 
governments, the European Union and per- 
haps the private sector could support sucha 
centre in Europe. A multi-agency initiative 
might establish a facility in North America. 
Leading countries in climate research such 
as China, India, Japan and Korea might 
jointly fund a facility in Asia. 

Computational challenges will have to be 
overcome. For example, for software to run 
efficiently on exascale computers compris- 
ing a million or more independent process- 
ing elements, only essential information can 
be passed between processors, and from 


Simulation of convective cloud systems in a limited-area high-resolution climate model. 


processor to memory. Climate and com- 
puter scientists will need to assess the physi- 
cal information content in the millions of 
climatic variables described*”. This will also 
be relevant in deciding at what level of detail 
the plentiful model data must be archived. 
Computer hardware will need to evolve to 
allow the efficient computation, transmis- 
sion and storage of model variables with a 
range of numerical precision. 

Even with 1-kilometre cells, unresolved 
cloud processes such as turbulence and the 
effects of droplets and ice crystals will have to 
be parameterized (using stochastic modelling 
to represent uncertainty in these parameteri- 
zations’). How, therefore, can one be certain 
that global-warming uncertainty can be 
reduced? The answer lies in the use of ‘data 
assimilation’ software — computationally 
demanding optimization algorithms that use 
meteorological observations to create accu- 
rate initial conditions for weather forecasts. 
Such software will allow detailed comparisons 
between cloud-scale variables in the high- 
resolution climate models and correspond- 
ing observations of real clouds, thus reducing 
uncertainty and error in the climate models”. 

High-resolution climate simulations 
will have many benefits beyond guiding 
mitigation policy. They will help regional 
adaptation, improve forecasts of extreme 
weather, minimize the unforeseen conse- 
quences of climate geoengineering, and be 
key to attributing current weather events to 
climate change. 

High-energy physicists and astronomers 
have long appreciated that international 
cooperation is crucial for realizing the 
infrastructure they need to do cutting-edge 
science. It is time to recognize that climate 
prediction is ‘big science’ of a similar league. m 
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HISTORY OF SCIENCE 
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Frontispiece from Museum Wormianum (1655) by antiquary Ole Worm, showing his cabinet of curiosities — a collection of fossils and other natural artefacts. 
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Pursuing the primordial 


Ted Nield ponders a history of how European science came to grasp Earth’s age. 


hree things annoy Martin Rudwick 
"Tien how the history of Earth science 
is portrayed. He scorns monoglot pro- 
vincialism, caricatures that pit science against 
religion — and hero-worship. So I hope he 
forgives the fact that in 1977, at 21, I made 
a pilgrimage to London to hear him speak 
at the Geological Society, and to ask him to 
autograph my copy of his Living and Fossil 
Brachiopods (Humanities Press, 1970). 
Rudwick had just switched from studying 
palaeontology and functional morphology 
— which uses engineering principles to make 
sense of the sometimes perplexing three- 
dimensional geometry of fossil skeletons — 
to the history of science. In this he has forged 
a second, even more distinguished career. 
Because the subject is also an enthusiasm of 
mine, I have followed his work with an appre- 
ciation that remains undimmed after reading 
his latest book, Earths Deep History. 
This traces the origin of historical science 
in the seventeenth century, when the things 
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we see around us in nature came to be seen as 
‘monuments, pregnant with historical mean- 
ing, like archaeological relics. With his talent 
for encapsulating pre-modern mindsets, 
Rudwick deftly explains how ideas of natural 
history were embedded in cultural history. 
He concentrates on thinking in the late eight- 
eenthcentury,notonly _ 

in Anglophone coun- a7 ] 
tries but, crucially, also 
in mainland Europe — 
especially France. The 
book’s premise, which 
has been used before 
by Rudwick and oth- 


ers (including the late J 
evolutionary biologist 
Stephen Jay Gould), is Earth’s Deep | 
that humanity's discov- anal adel 

as as Discovered 
eryofEarthsimmense ...4 Why it Matters 


age is a step in science’s 
progressive removal 
of humans from the 
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centre of things. First our planet was relegated 
to mere third rock from the Sun; then humans 
were transformed from the pinnacle of God’s 
creation into twigs on an evolutionary bush. 
Rudwick’s early brachiopod book drew 
on material originally expounded in papers, 
and in this respect Earths Deep History is its 
cousin. In 2005 and 2008, respectively, Rud- 
wick published his magisterial tomes Bursting 
the Limits of Time and Worlds Before Adam 
(both University of Chicago Press). These 
burst the limits of my briefcase and contrib- 
uted to my upper-body strength. It is there- 
fore welcome that their arguments have been 
condensed into a more portable account of 
the human appreciation of time. Unlike many 
authors (including Charles Darwin) whose 
big books were conceived as ‘sketches’ for 
never-completed longer works, Rudwick has 
sensibly done things the right way round. 
Beginning with Irish Archbishop James 
Ussher’s 1650 publication of a chronol- 
ogy suggesting that the world began on 


PRIVATE COLLECTION/BRIDGEMAN IMAGES 


23 October 4004 Bc, Rudwick shows how, by 
the eighteenth century, Western culture had 
long accepted that Earth had been around for 
millennia. Ussher was not alone: Isaac New- 
ton played the same game, suggesting a date 
of 3988 Bc. Rudwick is at pains to emphasize 
that Ussher was a serious chronologist who 


is r did not deserve his 
The image of post-Darwinian 
emergent science ridicule. What these 
heroically chronologies show 
struggling is that humanity 
against was at that time 
obscurantist assumed by all to 
religionis a have been part of 
fiction.” the Universe from 
its inception. 


Rudwick goes on to reveal how natural 
philosophers such as Jean-André Deluc and 
Johann Jakob Scheuchzer in Switzerland 
arrived at a truer picture. In attempting to 
reconcile scriptural and other textual evi- 
dence with that slowly emerging from nature's 
monuments, they came to realize that Earth 
had had along prehistoric existence for which 
there was no documentary evidence. Yet far 
from being stifled by what had gone before, 
they were profoundly aided by the work of 
traditional, historical and antiquarian schol- 
ars working in the Judaeo-Christian tradition. 
The image of emergent science heroically 
struggling against obscurantist religion is a 
fiction conjured by post-Darwinian revision- 
ism and militant atheists, Rudwick insists. 

Later natural philosophers, reading nature 
as innately historical, saw further. For Dar- 
win, species were not finished objects in neat 
taxonomic boxes; they represented the cut 
ends of historical threads, linking all to the 
origin of life. Most people today would cat- 
egorize Darwin as a biologist, but his view of 
species derived from his geologist’s instinct 
that all things embody a historical narrative. 
The realization that much of Earth’s history 
was not just prehistoric but prehuman gave 
birth to what we now call deep time. The book 
concludes with a relatively breezy scamper 
through the subsequent history of Earth sci- 
ence, taking in the 1960s and 70s arrival of its 
grand unifying theory, plate tectonics. 

Reading Rudwick’s prose is a pleasure, but 
this is not a ‘popular’ book. Rudwick provides 
little human interest behind the names, so if 
these do not already conjure up real human 
beings with lives and idiosyncrasies, he offers 
scant help. Indeed, he has few good words to 
say about the stylistic compromises of popular 
histories. I find this a trifle ungallant. Superior 
art, for all its academic shortcomings, engages 
more minds than the diligent knight on his 
charger of scholarship ever will. = 


Ted Nield is the editor of Geoscientist 
magazine in London. His latest book is 
Underlands (Granta). 

e-mail: ted.nield@geolsoc.org.uk 


Books in brief 


The Singular Universe and the Reality of Time 

Roberto Mangabeira Unger and Lee Smolin CAMBRIDGE UNIVERSITY 
PRESS (2014) 

The poor fit between relativity and the quantum impedes our 
understanding of the Universe. Now philosopher Roberto Unger and 
heoretical physicist Lee Smolin propose a new model resting on 
hree assumptions: time is real; mathematics is a limited tool; and 
here is only one Universe at a time. Smolin’s is the briefer, arguably 
more focused section of this hefty explication, setting out clear 
agendas for research into quantum foundations, explanations for 
he ‘arrow of time’ and other parts of this puzzle. 


p53: The Gene that Cracked the Cancer Code 
Sue Armstrong SIGMA (2014) 


= As science writer Sue Armstrong reveals in this succinct, accessible 
a a study, humanity’s genetic bulwark against cancer, p53, has featured 


in more than 70,000 papers since its 1979 discovery. Armstrong 


THE GENE Th traces how the tumour-suppressor gene has effectively enhanced 
CRACKED The our knowledge of cancer and inspired treatments, interweaving the 
CANCER COnE } science with stories of patients and pathologists. Most vivid are the 
7 a quotidian triumphs and disappointments of ‘lab lifers’ such as Michel 
y Kress, one of the gene’s several independent discoverers, and Galina 


Selivanova, working on a drug that restores function in mutant p53. 


Vaccine Nation: America’s Changing Relationship with 
Immunization 

Elena Conis UNIVERSITY OF CHICAGO PRESS (2014) 

In the 1960s afterglow of broad success in defeating polio and 
smallpox, the US public embraced vaccination. Yet by 2009, debate 
was raging over its risks, even as some 90% of toddlers were 

being vaccinated against a raft of diseases. Historian Elena Conis 
analyses the shifts in official and public thinking on immunization as 
initiatives by presidents from John F. Kennedy onwards drove waves 
of mass vaccination. As she reveals, each new vaccine has prompted 
a radical reevaluation of the disease it targeted. 


=< = Unnatural Selection: How We Are Changing Life, Gene by Gene 
h aay Emily Monosson ISLAND (2014) 

h “We beat life back with our drugs, pesticides and pollutants, but life 

responds.” So writes environmental toxicologist Emily Monosson in 


SELEC URAL thi ination of rapid evolution driven by artificial poi H 
EMity so is examination of rapid evolution driven by artificial poisons. Her 
ONoses= tour takes in antibiotic-resistant staph bacteria, herbicide-resistant 
Sond agricultural weeds, DDT-resistant bedbugs and the blue crabs of Piles 
AEG | Creek, New Jersey. Living in a soup of pollutants including mercury 
a: x d i and hydrocarbons, these decapodal survivors display altered 
—~V behaviours as well as resistance. Monosson ends with a thought- 


provoking look at epigenetics — evolution “beyond selection”. 


Virtuous Violence: Hurting and Killing to Create, Sustain, End, and 
Honor Social Relationships 

Alan Page Fiske and Tage Shakti Rai CAMBRIDGE UNIVERSITY PRESS (2014) 
Can murder or self-harm be seen as moral? Anthropologists Alan 
Fiske and Tage Rai argue that many who commit violent acts are 
motivated by feelings of moral rightness aimed at regulating social 
relationships. Despite the provocative title, the findings can seem 
commonsensical. From Mafia murders prompted by omerta (their 
code of honour) to god-appeasing sacrifice, moral justification for 
violent acts seems a near-constant in human behaviour. Barbara Kiser 
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Correspondence 


Evolution: students 
debate the debate 


I asked my third- and fourth- 
year undergraduate students 
whether they thought that 
evolutionary theory needs 
rethinking (see Nature 514, 161- 
164; 2014). More than two-thirds 
(26 out of 38) argued that it did 
not — because the synthesis 
proposed by Kevin Laland et al. 
has largely already occurred. 

Far from being neglected as 
Laland and colleagues imply, 
topics such as developmental 
bias, plasticity, niche 
construction and extra-genetic 
inheritance are well established 
in basic courses on evolutionary 
theory. Students today recognize 
that these processes can be 
both outcomes and causes of 
evolution. There is also a large 
body of work on co-evolutionary 
dynamics and interacting 
phenotypes (see, for example, 
any of the 400 or so papers that 
cite J. B. Wolf et al. Trends Ecol. 
Evol. 13, 64-69; 1998). 

Although all of my students 
agreed that the phenomena 
discussed by Laland and 
colleagues warrant further study, 
they — like the authors of the 
counterpoint piece, Gregory 
Wray et al. — did not view the 
authors’ ideas as an “alternative 
vision of evolution”. There 
would therefore seem to be no 
“struggle for the very soul of the 
discipline”. 

Hope Klug University of 
Tennessee, Chattanooga, USA. 
hope-klug@utc.edu 


Evolution: viruses 
are key players 


The debate on rethinking 
evolutionary theory (see Nature 
514, 161-164; 2014) should 
include viruses. By integrating 
into host DNA, viruses have 
markedly influenced the 
evolution and development 
of cellular organisms (see, for 
example, FE. Baluska Ann. NY 
Acad. Sci. 1178, 106-119; 2009). 
Viruses are the most abundant 


genetic entities on the planet. 
Almost all genomes of cellular 
organisms contain viral 
sequences, elements of which are 
now essential in gene regulation. 
Persistent endogenous 
retroviruses, for example, have 
contributed crucially to the 
evolution of the mammalian 
placenta. And the genetic 
variations that led to the 
evolution of adaptive immunity 
in vertebrates, or the equivalent 
system in prokaryotes, were not 
a result of random errors in DNA 
replication but of viral infection 
events (see L. P. Villarreal Viruses 
3, 1933-1958; 2011). 
Guenther Witzany Telos- 
Philosophische Praxis, Biirmoos, 
Austria. 
Frantisek Baluska University of 
Bonn, Germany. 
baluska@uni-bonn.de 


Evolution: networks 
and energy count 


Standard evolutionary theory 
should incorporate the 
complexity of adaptive evolving 
systems — including species, 
niches and environment — as 
dynamic relationship networks 
(see Nature 514, 161-164; 
2014). 

For example, epigenetic 
inheritance — which changes 
gene expression but not the 
DNA sequence — involves 
the storage of molecular 
information and its retrieval, 
transfer and processing at the 
supramolecular level. This 
involves transitory processes 
that are self-organized, self- 
assembled and dynamic. 

DNA replication too is one 
of countless functional tasks 
of interest in the study of 
evolution: changes propagate 
through interlinked levels 
of organization, inducing 
connectivity and interaction at 
all scales of the multilevel system. 

The process of natural 


> NATURE.COM 
For more on the evolutionary theory 
debate, see: go.nature.com/ghqfv9 


selection is now being captured, 
by modelling fitness attractors 
that incorporate power laws and 
non-equilibrium steady states at 
the edge of chaos, with energy 
landscapes made of basins, 
valleys, floors, ridges and 
saddle points (see, for example, 
K. Friston J. R. Soc. Interface 10, 
20130475; 2013). 

Arturo Tozzi ASL Napoli 2 Nord, 
Naples, Italy. 
tozziarturo@libero.it 


Anti-vivisectionists 
respond 


Following our seven-month 
undercover investigation, the 
British Union for the Abolition 
of Vivisection (BUAV) strongly 
disagrees with your claim that 
the Max Planck Institute in 
Tubingen, Germany, has done 
a “good job” on its website in 
explaining its neuroscience 
research on macaques (see 
Nature 513, 459-460; 2014). 

Our investigation of the 
macaques’ treatment and 
conditions was undertaken with 
SOKO-TS, a German animal- 
protection organization. The 
BUAV goes to enormous lengths 
to check facts and is extremely 
careful only to publish 
allegations that it believes are 
demonstrably true. 

After rigorously scrutinizing 
footage and documentation 
from this investigation, the 
leading German television 
station Stern has called into 
serious question claims and 
images posted on the Max 
Planck Institute's website. For 
example, the institute makes 
what in our opinion is the 
bizarre claim on its website that 
the animals do not suffer. 

Jane Goodall, the renowned 
primatologist, says she has 
seldom seen such sickening 
experiments. They have no 
place in a civilised society. 

Following the Stern 
broadcast, the institute 
has identified a need for 
improvement in terms of staff 
organization and agreed to 


introduce overnight care for the 
animals following surgery and 
to improve veterinary attention. 

We consider that the use of 
macaques in these experiments 
is unnecessary: the continued 
creative and ethical use of 
imaging techniques on patients 
and volunteers is, we believe, 
far more likely to produce 
improvements in neurological 
health. 

It is not better PR that animal 
researchers need, as you argue, 
but a paradigm shift in thinking, 
a better appreciation of the 
suffering they cause animals 
and acommitment to genuine 
transparency. 

Michelle Thew BUAV, London, 
UK. 
michelle.thew@buav.org 


Ice-bucket challenge 
should jolt funding 


The Italian prime minister 
Matteo Renzi was among 
the vast number of people 
who accepted the ‘ice-bucket 
challenge’ this summer, 
helping to raise €2 million 
(US$2.5 million) in Italy for 
research into amyotrophic 
lateral sclerosis (ALS), also 
known as motor neuron disease 
(see Nature 514, 403-404; 2014). 

This sum exceeds his 
government's average annual 
budget for ALS research, which 
is still seriously underfunded 
— despite Italy ranking third in 
international ALS publications, 
after the United States and 
Japan. 

ALS researchers worldwide 
are waiting to see how the 
sum of around $100 million 
that has been collected by this 
philanthropic phenomenon 
will be used, and whether it will 
boost governments’ plummeting 
contributions towards basic 
research. I hope so: without 
such funds there can be no 
development of new drugs for 
this incurable disease. 
Maria Teresa Carri University of 
Rome “Tor Vergata’, Italy. 
carri@bio.uniroma2. it 
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OBITUARY 


Allison Doupe 


(1954-2014) 


Neuroscientist and psychiatrist who linked birdsong and human speech. 


science, there is a common 

phrase: much is known but 
in different heads. Occasionally, 
multiple disciplines come together 
in one remarkable head. 

Allison Doupe, a systems neuro- 
scientist, avian biologist and clini- 
cal psychiatrist, brought together 
many perspectives to give us a new 
understanding of birdsong and, 
ultimately, ofhuman speech. Strad- 
dling the bird laboratory and the 
clinic, she discovered the principles 
by which birds learn their songs, 
and used these insights to propose 
the neural basis for learning vari- 
ous motor skills, including speech, 
in humans. 

Doupe, who died of cancer on 
24 October, grew up in Montreal, 
Canada, where she attended French- 
speaking schools. After graduating 
from McGill University in Montreal, 
she moved to Harvard University in 
Cambridge, Massachusetts, where 
she simultaneously earned a PhD in 
neurobiology and an MD from the 
medical school. Doupe continued 
to pursue both science and medicine 
on the west coast. She trained in psychiatry 
at the University of California, Los Angeles, 
and then completed a five-year postdoctoral 
fellowship at the California Institute of Tech- 
nology in Pasadena with avian neurobiologist 
Mark Konishi. It was this fellowship that got 
her hooked on birdsong. 

In the late 1980s, avian neurobiology was 
an exciting discipline. Studies, including 
those in which researchers recorded from, 
or lesioned, different parts of a bird’s brain, 
had revealed most of the major structures 
involved in producing and learning songs. 
The production of song was governed by 
well-defined clusters of neurons innervating 
the vocal muscles. Song learning relied on 
a complex network of specialized forebrain 
areas, including auditory and motor-control 
centres that form a sensory-motor circuit. 
Neurobiologists knew that birdsong was 
learned during a crucial period early in life 
through imitating, usually a parent. 

Doupe was intrigued by the parallels 
between birdsong and human speech. 
Unlike many other taxa, birds and humans 
rely on imitative learning as well as auditory 
feedback to develop normal communication 


E this era of interdisciplinary 
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skills. In a now-classic 1999 Annual Review 
of Neuroscience paper with linguist Patricia 
Kuhl, Doupe laid out for the human-speech 
research community the mechanistic 
questions that had been explored in bird- 
song (A. J. Doupe and P. K. Kuhl Ann. Rev. 
Neurosci. 22, 567-631; 1999). This paper 
framed the study of both avian and human 
communication for the next decade. 

One problem that needed solving was 
how infant humans and juvenile birds match 
what they hear from adults with the sounds 
that they produce as they begin to vocalize. 
As a postdoctoral fellow, Doupe became 
intrigued by the auditory template hypoth- 
esis: when young birds hear adults sing, they 
form an auditory memory of the sounds they 
hear, even though they are as yet unable to 
reproduce them. Birds, like humans, prac- 
tise until the sounds they produce match the 
auditory template in their brain. 

By recording the activity of individual 
neurons, Doupe found that the adult song 
was represented within the young bird’s sen- 
sorimotor pathway, now called the anterior 
forebrain pathway. In this network, she also 
discovered neurons that selectively responded 
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to the bird’s own song, but not the 
song of an adult tutor. Doupe and her 
students suggested that whenever 
young birds practised their songs, 
the electrical signals sent to the 
motor pathway were compared with 
a parallel discharge sent through the 
song-learning pathway, where the 
template adult song was stored. This 
‘efference copy’ model, although still 
theoretical, has proved useful for 
understanding brain activity during 
learning, including the acquisition of 
human speech. 

Doupe’s birdsong research, along 
with her clinical experience, ulti- 
mately led her to questions about 
the role of social context. She and 
her students demonstrated how 
the anterior forebrain system in 
songbirds generates the variation 
in performance needed for birds to 
improve their song during practice 
sessions, yet allows the birds to sing 
stereotyped renditions in the pres- 
ence of a potential mate. This work 
established the birdsong system as 
a model for understanding many 
aspects of sensorimotor control and 
its development in humans, includ- 
ing the importance of generating variation 
to allow learning. Many have compared the 
anterior forebrain pathway in songbirds to 
the cortico-basal ganglia system in humans 
— the region involved in the learning of 
skills that become habitual, such as driving, 
typing and walking. 

In life, as in her science, Allison was 
passionate about development and learning. 
Her devotion to her twin sons, now ten, like 
her dedication to her many students, postdocs 
and patients, was legendary. Her work will 
continue under the guidance ofher extended 
scientific family, including her husband and 
collaborator, Michael Brainard. But for all of 
us who learned the importance of tutoring 
from working with her, her absence will make 
our work a little less perfect. m 


Thomas R. Insel is director of the US 
National Institute of Mental Health, 
Bethesda, Maryland, USA. Story Landis is 
former director of the US National Institute 
of Neurological Disorders and Stroke, 
Bethesda, Maryland, USA. 

e-mails: tinsel@mail.nih.gov; 
landiss@ninds.nih.gov 
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Mice in the ENCODE spotlight 


Following on from affiliated projects in humans and model invertebrates, the Mouse ENCODE Project presents 
comprehensive data sets on genome regulation in this key mammalian model. SEE ARTICLES P.355, P.365, P.371 & LETTER P.402 


PIERO CARNINCI 


he mouse genome was sequenced in 

2002 as a primary model in which to 

study gene function and human dis- 
eases and to develop drugs'. This was followed 
by maps of transcribed messenger RNA mol- 
ecules and of long, non-protein-coding RNAs, 
which facilitated such experiments and analy- 
sis’. Yet although 17 mouse strains have been 
sequenced’, genome function and regulation 
cannot be understood by sequence analysis 
alone. Now, in four papers published in this 
issue*’, the Mouse ENCODE Consortium pre- 
sents data sets that dramatically enhance our 
understanding of the regulation of the mouse 
genome, and of the similarities and differences 
compared with the human genome. 

The ENCODE project*” was started by the 
National Human Genome Research Institute 
in 2003, with the aim of mapping functional 
elements of the human genome. The pro- 
ject, later expanded as Mouse ENCODE and 
modENCODE (to include invertebrate model 
organisms), has driven technology develop- 
ment and standardization for the identifica- 
tion of expressed RNAs and regulatory regions. 
These technologies have given rise to compre- 
hensive data sets for analysing genome regula- 
tion and comparing this across species. Among 
the resources are libraries of mRNA sequences 
and maps of genomic regions that are bound 
by transcription factors or by RNA polymer- 
ases (the enzymes that initiate RNA transcrip- 
tion). There are also data sets on chemical 
modifications to the histone proteins around 
which DNA is wrapped (forming a complex 
called chromatin). Such modifications alter the 
accessibility of the DNA to other proteins and 
thereby demarcate transcriptionally ‘active’ or 
‘repressed’ chromatin regions. And there are 
data on large-scale chromatin and chromo- 
some structures. 

The Mouse ENCODE Project has taken 
advantage of the ENCODE experience to pro- 
vide a much-needed comprehensive resource 
for mouse genomics and its first in-depth 
analysis. Stergachis and colleagues’ data” 
(page 365) reveal that, in the roughly 75 mil- 
lion years of evolution since humans and 
mice diverged, the primary (nucleic-acid) 
sequence of regulatory elements has changed 
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Figure 1 | Transcription-factor binding in mice and humans. Gene transcription rates are regulated by 
transcription factors, which bind to promoter regions close to the specific gene or to enhancer regions at 
distant sites. Comparisons of maps of such binding sites generated by the mouse and human ENCODE 


projects* 


” suggest that many differences in transcription levels between equivalent (orthologous) genes 


in the two organisms result from transcription-factor binding sites (labelled as TFs) occupying different 
locations. A further regulatory influence is the insertion of retrotransposon elements (stretches of DNA 
derived from reverse transcription of RNA) that may contain transcription-factor binding sites. 


dramatically. About half of the transcription- 
factor binding sites in regulatory elements 
of the mouse genome are not present in the 
equivalent (orthologous) elements in humans, 
and around one-quarter of them have migrated 
to different positions (Fig. 1). Regulatory ele- 
ments that are distant from the gene that they 
regulate (enhancers) have diverged more than 
those that are close (promoters). Despite this 
divergence, Cheng et al.° (page 371) show that 
there is similar chromatin activity in ortholo- 
gous promoter regions in the two genomes, 
suggesting that different transcription factors 
could be used to achieve similar transcriptional 
activity. Furthermore, despite the different pri- 
mary sequences of many regulatory elements, 
the basic reciprocal regulatory networks among 
transcription factors are evolutionarily con- 
served between mice and humans’. 
Surprisingly, the Mouse ENCODE Consor- 
tium (Yue et al.*; page 355) finds that sequences 
commonly considered useless or harmful, such 
as retrotransposon elements (stretches of DNA 
that have been incorporated into chromosomal 
sequences following reverse transcription from 
RNA), have species-specific regulatory activ- 
ity. Because retrotransposon elements can con- 
tain embedded transcription-factor binding 
sites, this may provide unexpected regulatory 
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plasticity (Fig. 1). Evolutionary conservation 
of primary sequence is typically considered 
synonymous with conserved function, but 
this finding suggests that this concept should 
be reinterpreted, because insertions of retro- 
transposon elements in new genomic regions 
are not conserved between species. 

Although gene expression might be 
expected to be similar in the same organs 
and tissues in different species, comparative 
analyses by the consortium’ reveal that the 
expression level of many genes (but not all 
gene categories) is species specific, rather than 
organ specific. These differences may derive 
from the fact that organs are composed of 
different cell types in mouse and human 
tissues, but it is more likely to have arisen from 
different basic transcriptional activity driven 
by different regulatory elements. 

Despite these variations between the mouse 
and human genomes, Cheng et al. ° show that 
many single-nucleotide sequence differences 
that have been associated with diseases in 
genome-wide association studies in humans 
are localized to orthologous regions of the 
mouse genome that have modifications that 
mark active chromatin. This finding validates 
the importance of the mouse as a model organ- 
ism for ongoing disease studies. 


Finally, Pope et al.’ (page 402) have 
generated high-quality maps of the physi- 
cal position of chromosomes in the nuclei of 
mouse and human cells. These maps show 
that the boundaries of replication domains 
(genomic regions that replicate at the same 
time during cell division) correlate well with 
topologically associating domains — chromo- 
some structures that are associated with the 
regulation of gene expression. 

Analysis of these data will continue, both 
broadly and in the context of specific biologi- 
cal questions, although new tools for visual- 
izing, analysing and interpreting such data are 
needed to open them up for broader use by 
experimental biologists. But the existing find- 
ings are already thought-provoking. For exam- 
ple, they suggest that we should rethink the 
relationship between genomic function and 
evolutionary conservation. Regulatory regions 
and long non-coding RNAs (IncRNAs) are not 
subject to the evolutionary constraints of pro- 
tein-coding genes, which may help to explain 
the sequence drifts reported in these papers. 
However, it is striking that transcription-factor 
networks are conserved despite low conserva- 
tion of their binding positions in the genome. 
Further experiments are needed to establish 
whether transcription-factor interactions with 
regulated regions always promote transcrip- 
tion or whether they can also be repressive. 
The differences in regulation between mice 
and human genomes that have emerged from 
these studies should all be taken into account 
when using mouse models to assess biological 
functions and, in particular, drug responses. 

Some genomic features in particular, such as 
IncRNAs, warrant further investigation. The 
Mouse ENCODE Project analysed only RNA 
molecules that are polyadenylated (they have a 
string of adenine bases at the 3’ end); although 
this modification marks most mRNAs, many 
IncRNAsare not polyadenylated”®, and so analy- 
sis of non-polyadenylated RNAs in mice will be 
needed to better define the similarities and dif- 
ferences between the full complement of RNA 
transcripts in mice and humans. A compre- 
hensive map of orthologous human and mouse 
IncRNAs will also be useful for experimental 
tests of the function of human IncRNAs in mice. 

Furthermore, there is room to expand the 
data set on transcription-factor binding sites 
generated by Cheng and colleagues’, because 
their experiments were performed using 
mouse cells that are easy to cultivate (MEL 
and CH12) and thus provide plenty of experi- 
mental material, but they do not represent the 
biological variability present in the hundreds 
of cell types found in mammals”. It will also 
be useful to replicate these studies in different 
mouse strains and to connect differences in 
genome sequence’ between the strains to dif- 
ferences in gene regulation and traits. 

The data sets provided by the mouse 
ENCODE project boost our capacity to ana- 
lyse the mouse genome in a way that was 


unthinkable a decade ago, and allows us to gain 
insights into dimensions that were not fore- 
seeable. Understanding genomic regulation in 
mice is much more than a linear addition to 
our knowledge of genome regulation overall 
— it is an essential step towards better under- 
standing human biology and improving bio- 
medical applications and drug development. m 


Piero Carninci is at the RIKEN Center for 
Life Science Technologies, Division of Genomic 
Technologies, RIKEN Yokohama Campus, 
Yokohama, Kanagawa 230-0045, Japan. 
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RNA made tn its 
own mirror image 


An RNA enzyme has been generated that can assemble a mirror-image version of 
itself. The finding helps to answer a long-standing conundrum about how RNA 
molecules could have proliferated on prebiotic Earth. SEE LETTER P.440 


SANDIP A. SHELKE & JOSEPH A. PICCIRILLI 


any organic and biological molecules 
Me in right-handed and left- 

handed versions that are mirror- 
image twins of one another. These variations 
are referred to as D- and L-enantiomers, 
respectively. Modern RNA molecules are 
linear polymers that are synthesized from ribo- 
nucleotide monomers, and take the p-form. 
But on page 440 of this issue, Sczepanski and 
Joyce’ suggest that early evolution may have 


p-Oligonucleotide 


involved an interplay between the p- and 
L-structures of RNA. 

Before DNA and proteins existed, RNA 
may have evolved as the primordial macro- 
molecule that could both store information 
like DNA does and catalyse chemical reactions 
like many proteins do. According to this ‘RNA 
world hypothesis”, one of the functions of 
these RNA enzymes (called ribozymes) was 
to replicate other RNA molecules by using 
their sequences as templates to make com- 
plementary strands. This function, called 
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Figure 1 | Possible mechanism for RNA replication on prebiotic Earth. Sczepanski and Joyce' have 
generated an RNA enzyme (a ribozyme) that catalyses the polymerization of oligonucleotides of the 
opposite handedness to itself: the right-handed p-ribozyme yields the left-handed L-ribozyme, and 

vice versa. This adds weight to the idea that a cross-handed cycle involving both p- and L-ribozymes may 
have replicated RNA on prebiotic Earth. In the cycle, the L-ribozyme acts on a complex formed between 
a D-template RNA strand and p-oligonucleotides, joining the latter together to form a duplex RNA 
product. Separation of the duplex’s strands liberates the p-ribozyme. This then catalyses formation of the 
L-ribozyme from the left-handed template-oligonucleotide complex. 
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polymerization, involves the chemical joining 
of ribonucleotide monomers or oligonucleo- 
tides (short sequences of monomers). 

Some 30 years ago, a conundrum arose 
concerning how RNA molecules first prolif- 
erated through prebiotic chemical reactions. 
This was because of the demonstration by 
Joyce et al. that the non-enzymatic copying 
of an RNA template to form a complementary 
RNA strand could be brought to a screeching 
halt by the incorporation into the growing 
polymer of monomers of opposite handed- 
ness to the template. This phenomenon was 
termed ‘enantiomeric cross-inhibition. Given 
that both p- and L-enantiomers of RNA mol- 
ecules were probably present as substrates on 
prebiotic Earth, how could template-directed 
polymerization have proceeded? Sczepanski 
and Joyce now revisit this issue by creating a 
ribozyme that not only catalyses template- 
directed polymerization in the presence of 
both p- and L-enantiomers, but actually prefers 
mononucleotides and oligonucleotides of the 
opposite handedness to itself as its substrates. 

The authors synthesized a pool of right- 
handed p-RNA polymers of random sequences 
and linked them covalently to a left-handed 
L-RNA template in the presence of left-handed 
oligonucleotide substrates. They then used 
in vitro selection*® to isolate RNA species 
from the pool that could join (polymerize) the 
substrates. After ten rounds of selection and 
amplification of catalytic molecules; pruning 
of superfluous sequences; insertion of another 
randomized segment to create a new pool; 
and then another six rounds of selection and 
amplification, a D-ribozyme was isolated that 
could perform template-directed joining of 
L-substrates about a million times faster than 
in the uncatalysed reaction’. 

As with ribozymes previously selected and 
further optimized for polymerization activity’, 
this ribozyme resembles modern-day polymeri- 
zation enzymes (polymerases) in several ways. 
First, it can operate on completely separate 
template-substrate complexes, implying that 
sequence-independent contacts form between 
it and the complexes. Second, it can perform 
limited polymerization by catalysing the 
sequential joining of several mononucleotides. 
In addition, as long as the oligomeric substrates 
are bound to their complementary templates, 
the ribozyme seems to be indifferent to sub- 
strate length. In fact, the authors observed that 
it can connect 11 L-oligonucleotides to form a 
mirror copy of itself, a remarkable first demon- 
stration of an enzyme (RNA or protein) being 
synthesized by its own enantiomer. Importantly, 
the p-ribozyme and its L-enantiomer efficiently 
catalyse their respective joining reactions even 
ina mixture containing both p- and L- versions 
of the substrates and templates. In other words, 
the enantiomeric cross-inhibition that thwarted 
non-enzymatic template-mediated replication 
does not occur. 

This work adds weight to the notion of a 


primordial RNA world in which cycles of 
cross-handed replication used mirror-image 
forms of RNA (Fig. 1). Such mutualistic cou- 
pling of p- and L-RNA polymerases might 
have conferred several advantages on RNA 
evolution, and may now benefit researchers 
who aspire to create RNA polymerase sys- 
tems that can self-replicate. Because of their 
shapes, D- and t-RNA molecules cannot 
form consecutive Watson-Crick base pairs 
with each other”®, just as left and right hands 
cannot properly handshake one another. 
Consequently, Sczepanski and Joyce's polymer- 
ase is unlikely to exhibit sequence preferences 
or restrictions owing to duplex formation 
between complementary sequences in the 
ribozyme and in templates or reaction prod- 
ucts. These problems have confounded the 
experimental search for a general polymerase 
that can copy RNA sequences without bias, 
and may similarly have affected the course 
of early macromolecular evolution. Further- 
more, the dependence on two distinct, coupled 
polymerases makes the replication cycle less 
susceptible to invasion by molecular parasites 
that could usurp the chemically activated sub- 
strates needed for the polymerization reaction. 

Viewing the work in the context of evolu- 
tion begs the question of how an all p- or all 
L-ribozyme could have arisen to begin with. 
Sczepanski and Joyce suggest that simple 
forms of nucleic acids that lacked the mirror 
twin served as templates for polymerization of 
RNA mono- or oligonucleotide substrates on 
prebiotic Earth. It remains unclear, however, 
whether RNA polymerization from such tem- 
plates would be immune to deleterious enan- 
tiomeric cross-inhibition. 

Beyond the implications for the RNA world 
hypothesis, the new ribozyme may have 
practical value for the production of spiegel- 
mers'' — L-versions of functional p-RNAs 
(the name derives from the German word for 
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‘mirror’: Spiegel). Spiegelmers resist degradation 
by nucleases — the enzymes that degrade 
nucleic acids — and seem to avoid detection 
by the immune system, making them attractive 
therapeutic candidates and sensors for biologi- 
cal ligands. However, because natural enzymes 
do not recognize L-nucleotides, spiegelmers 
can be made only by chemical synthesis, which 
limits access to longer spiegelmers. Ribozyme- 
catalysed cross-handed polymerization 
might enable convenient enzymatic access 
to spiegelmers, and eventually render them 
directly amenable to in vitro selection methods. 
Sczepanski and Joyce's twin polymerases 
will probably require further engineering 
before they can copy long RNA templates of 
any sequence efficiently and accurately. Never- 
theless, successive improvements have been 
made for other in vitro-selected ribozymes*”, 
providing reason for optimism in this case. = 
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Reactive walls 


Domain walls are natural borders in ferromagnetic, ferroelectric or 
ferroelastic materials. It seems that they can also be reactive areas that produce 
crystallographic phases never before observed in bulk materials. SEE LETTER P.379 


PHILIPPE GHOSEZ & JEAN-MARC TRISCONE 


have been attracting much attention. 

They are seen as a source of new proper- 
ties and functionalities. Engineering these 
borders has already revealed phenomena such 
as conductivity and superconductivity at the 
frontier between insulating compounds, and 
magnetism between non-magnetic materials’. 


| Pa between different oxide materials 
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In fact, it is not even necessary to combine dif- 
ferent materials to create interfaces. Ferroic 
materials such as ferromagnets, ferroelectrics 
and ferroelastics naturally break into domains 
characterized by different orientations of the 
material's spontaneous ferroic order — mag- 
netization for ferromagnets, electric polari- 
zation for ferroelectrics and macroscopic 
deformation for ferroelastics. These domains 
are separated by interfaces called domain 


a_|n-phase rotations of oxygens d 
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Figure 1 | Atomic motions and domain-wall structure. a—c, The three types of atomic distortion that 
coexist in the Pbnm orthorhombic phase (a-a-c* phase in Glazer’s notations'') of TbMnO,: a, in-phase 
rotations (a’a°c*) of oxygen octahedra around the z axis with amplitude R,,’; b, anti-phase rotations 

about the x (left; a-a’c’) and y (right; a’a"c’) pseudo-cubic directions with amplitude R,” = Ry; 

c, anti-polar Tb-cation motions along the x and y axes with amplitude D f= D;, d, The global atomic 
motions (top) and the amplitude (bottom) of the individual distortions around a 90° ferroelastic domain 
wall in this material. Farokhipoor et al.’ showed that, at such a domain wall, the amplitude of R,” is reversed, 
producing a reversal of D,; and a saw-like steric effect modulated at the atomic scale (black arrows) that 
causes a substitution of Tb atoms by smaller Mn atoms in every other row. Tb, terbium; Mn, manganese. 


walls”. On page 379 of this issue, Farokhipoor 
et al.° highlight that, instead of being a 
passive region that accommodates the differ- 
ent ferroic-order orientations of the domains 
that border it, a domain wall can also be a 
reactive area that generates and stabilizes new 
two-dimensional crystallographic phases not 
achievable by conventional means. 

Owing to spatial symmetry breaking and 
local mechanical constraints, a domain wall 
may exhibit properties distinct from the two 
domains that surround it. The physics of 
domain walls is expected to become remark- 
ably complex in multiferroics (materials that 
have two or more forms of ferroic order), in 
which different types of domain wall coexist 
and can be coupled’. Clear understanding 
of what happens at domain walls remained 
elusive until the last few years, when imaging 
techniques such as atomic force microscopy 
and high-resolution transmission electron 
microscopy, combined with first-principles 
calculations, provided access to atomic-scale 
characterization of the walls. This char- 
acterization brought to light unexpected 
phenomena such as the conducting behaviour 
of ferroelectric domain walls in the insulat- 
ing multiferroic bismuth ferrite* (BiFeO,) and 
other non-conducting oxides*®, and opened 
perspectives for domain-wall nanoelectronics’. 


In their study, Farokhipoor et al. focused 
on ferroelastic domain walls in terbium 
manganite (TbMnO,). Beyond revealing a 
new functionality of domain walls related to 
the local stabilization of an exotic two-dimen- 
sional crystallographic phase, the authors also 
explained the appearance of a net magnetiza- 
tion in the otherwise antiferromagnetic low- 
temperature phase of thin films of ToMnO, 
and related compounds; in a purely antiferro- 
magnetic phase, spins of neighbouring elec- 
trons point in opposite directions, producing 
no net magnetization. 

TbMn0O, is a distorted perovskite, a com- 
pound of general formula ABO,, where A 
and B are two cations of different size and O 
is oxygen. At low temperatures, bulk TbMnO, 
develops a spiral spin structure that breaks 
spatial inversion symmetry and induces an 
electric polarization, making it a magneto- 
electric multiferroic”. At the structural level, 
it adopts at low temperature a common ‘Pbnm 
orthorhombic’ lattice configuration, which 
can be viewed as a distortion of the ideal, 
high-temperature cubic structure. The distor- 
tion primarily involves in-phase rotations of 
oxygen octahedra about the vertical (z) axis 
with amplitude R,* (Fig. 1a) and anti-phase 
rotations of oxygen octahedra about the hori- 
zontal (x) and (y) pseudo-cubic directions with 
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equal amplitude (R, = R, ; Fig 1b). 

When TbMn0O, is grown in epitaxial 
thin-film form on a strontium titanate 
(SrTiO,) substrate, as in the present study, 
it preserves such an oxygen rotation pat- 
tern, with the R,* rotation axis aligned along 
the growth direction; in epitaxial growth, 
the film’s atoms are ‘aligned’ with atoms in 
the underlying substrate. As discussed by 
Farokhipoor et al., to accommodate the 
mechanical constraint induced in the film by 
the epitaxial growth process, the film naturally 
develops “90° ferroelastic domain walls, which 
are associated with a reversal of one of the R,” 
or R, rotation patterns (R, in Fig. 1d). 

By combining experimental and first- 
principles techniques, Farokhipoor and col- 
leagues demonstrate that, to release the specific 
mechanical stress inherent in such a domain 
wall, a systematic chemical substitution of Tb 
atoms by smaller Mn atoms occurs in every 
other row along the film’s growth direction, 
producing a new phase with unexpected 
square-planar MnO, groups. The extra Mn 
atoms at the domain wall are responsible for 
unusual magnetic properties: being located 
between consecutive MnO, planes that are 
antiferromagnetically coupled, these atoms 
are magnetically frustrated — that is, they 
cannot simultaneously align or anti-align their 
spins with those of both neighbouring planes. 
Such magnetic frustration leads to canting 
of neighbouring spins and produces a net 
magnetization. 

It is important to understand that the driv- 
ing force for the chemical substitution at the 
domain wall is not the motions of the oxygen 
octahedra themselves, but the presence of 
additional Tb displacements in the horizontal 
direction. In ABO, perovskites, such anti-polar 
A-cation motions (here, Tb displacements; Fig. 
1c) of amplitude D,* = D,* are intrinsic to the 
Pbnm phase: they are naturally induced by the 
oxygen rotations through linear coupling of R, 
(Ry), R,* and DS DF) distortions’. As previ- 
ously discussed in another context’, such odd 
coupling of three distortions mandates that the 
reversal of R, at the domain wall (while keep- 
ing R,* unchanged) produces a reversal of D,”. 
This latter reversal translates into opposite 
motions of the Tb cations on the left and right 
sides of the domain wall (Fig. 1d), and creates 
anon-uniform steric (geometric) effect that is 
responsible for the selective chemical substitu- 
tion. This effect is therefore not restricted to 
TbMn0O,, but should be generic for this type 
of domain wall in orthorhombic perovskites. 

It is usually understood that domain walls 
adjust to release the stress they are subjected 
to. This is true, but what happens here is more 
subtle than a simple elastic relaxation. The 
stress produced at the domain walls studied 
by Farokhipoor et al. is far from homogeneous. 
The anti-polar Tb motions produce a peculiar 
saw-like steric effect modulated at the atomic 
scale. The domain wall therefore seems to be 
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a unique confined environment that is able to 
generate and stabilize new crystallographic 
phases not necessarily achievable by other 
means. m 
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Succinate strikes 


The high levels of tissue-damaging reactive oxygen species that arise during 
a stroke or heart attack have been shown to be generated through the 
accumulation of the metabolic intermediate succinate. SEE LETTER P.431 


LUKE A. J. O'NEILL 


hen a stroke or a heart attack strikes, 

the tissue injury that occurs can be 

devastating. This damage to the 
brain or heart is a result of an initial starving 
of oxygen owing to blocked blood flow, 
followed by reoxygenation once blood 
flow is restored. Ischaemia reperfusion 
(IR) injury, as it is called, is a major 
health burden, and there are very few 
options to prevent it. On page 431 of 
this issue, Chouchani and colleagues’ 
present a finding that might inspire a 
new therapeutic approach. They reveal 
that succinate, an intermediate mole- 
cule normally formed during cellular 
respiration, is consistently elevated in 
ischaemic tissues, and that preventing 
this elevation is remarkably protective 
against IR injury in mouse models of 
stroke and heart attack. These find- 
ings add to those from other studies 
implicating succinate as an injurious 
metabolite, the limitation of which 
might have clinical utility”. 

The study began with an investiga- 
tion into why tissue-damaging mol- 
ecules called reactive oxygen species 
(ROS) are produced at abnormally 
high levels during IR injury*. ROS are 
formed as a by-product of cellular res- 
piration — the series of reduction and 
oxidation reactions, occurring in orga- 
nelles called mitochondria, that gen- 
erates energy from the breakdown of 
nutrients. The authors proposed that 
any changes in metabolite levels dur- 
ing ischaemia and reperfusion might 
predict the source of excessive ROS. 
They blocked blood flow to four tis- 
sues (brain, kidney, heart and liver) 
in mice, and found succinate to be 


i 


elevated in all four, by as much as 19-fold, 
over ischaemic periods of 45 minutes. In fact, 
succinate was the only intermediate of mito- 
chondrial metabolism found at altered levels 
in all the ischaemic tissues. 

If succinate were fuelling the ROS accumu- 
lation, Chouchaniet al. predicted that it would 
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Figure 1 | Succinate in inflammation and infarct. Chouchani et al.' 
show that the metabolic intermediate succinate is markedly elevated 
during ischaemia — oxygen deprivation to a tissue as a result of 
blocked blood supply. This accumulation occurs through the reverse 
activity of the enzyme succinate dehydrogenase (SDH). On blood 
reperfusion, the succinate is oxidized, leading to reverse electron 
transport through complex 1 (a multiprotein enzyme complex), 
which generates reactive oxygen species (ROS) — molecules that 
mediate the infarct (damaged tissue) seen in strokes and heart attacks 
and that promote inflammation. Succinate has also been implicated 
in inflammation driven by macrophage cells that are activated 

when the receptor TLR4 is bound by the bacterial component 
lipopolysaccharide (LPS)’ or, perhaps, by products of ischaemic 
tissue'’. In this case, the succinate is generated from the metabolism 
of glutamine, and leads to activation of the transcription factor 
HIF-1a and expression of genes encoding pro-inflammatory proteins. 
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be rapidly oxidized during reperfusion, when 
oxygen is plentiful; indeed, they observed 
that succinate levels returned to normal after 
5 minutes of reperfusion. They then addressed 
where the succinate might be coming from, 
and tested an earlier speculation’ that the 
enzyme succinate dehydrogenase (SDH), 
which breaks down succinate during normal 
oxygen-consuming cellular respiration, might 
act in reverse under anaerobic conditions. This 
also proved to be the case — the researchers 
found that succinate is generated from its usual 
downstream metabolite fumarate in the ischae- 
mic tissues through the action of SDH, and 
that treatment of mice with a form of malonate, 
an SDH inhibitor, decreased succinate accu- 
mulation during ischaemia and reduced the 
extent of tissue damage in models of both 
heart and brain IR injury. Further- 
more, in the brain model, malonate 
treatment prevented the decline 
in neurological function and sen- 
sorimotor function associated 
with stroke. 

The authors went on to identify that 
excessive ROS production occurs when 
SDH drives reverse electron transport 
through mitochondrial complex I, the 
first enzyme complex in the cellular- 
respiration chain (Fig. 1). This reverse 
electron transport occurs because, on 
reperfusion, the succinate that has 
accumulated is rapidly oxidized, lead- 
ing to over-reduction of the cellular 
pool of coenzyme Q molecules, which 
are crucial electron carriers during 
respiration. The over-reduction drives 
electrons back through complex I, 
generating ROS in the process. The 
researchers also show that blocking 
electron flow through complex I using 
the chemical compounds rotenone or 
mitochondria-targeted S-nitrosothiol® 
inhibits the increase in ROS in tissues 
undergoing reperfusion after ischae- 
mia. 

These findings join those of several 
other studies pointing to succinate as 
an inducer of inflammation~*” ’. Most 
notable among these is the finding” 
that macrophage cells of the immune 
system are induced to produce 
succinate following activation 
through Toll-like receptor 4 (TLR4), 
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which recognizes lipopolysaccharide, a 
component of the cell walls of some bacteria 
(Fig. 1). In that situation, the succinate is 
generated from the amino acid glutamine 
and acts to stabilize the transcription factor 
HIF-1a, which in turn leads to an increase 
in activity of HIF-1a-dependent genes, one 
of which encodes the pro-inflammatory 
molecule IL-16 (ref. 2). Of direct relevance 
to the current study are observations that 
implicate TLR4 (and TLR2) in IR injury in 
the heart’””’. It is possible that macrophage 
TLRs are bound by products of damaged 
tissue during ischaemia, activating the cells 
to produce succinate and thus contributing 
to IR injury. 

Succinate is also elevated in other inflam- 
matory conditions, including colitis’ and 
rheumatoid arthritis’, and it is possible that 
succinate generates ROS in those conditions 
through complex I, as shown by Chouchani 
and colleagues. And binding of succinate to a 
receptor called SUCNRI, which is expressed 
by dendritic cells of the immune system, has 
been shown to enhance the production of pro- 
inflammatory molecules by these cells when 
they are activated by TLR binding. 

Chouchani and co-workers’ study should 
therefore stimulate further analysis not only 
of the importance of succinate as a mediator 
of IR injury, but also of the molecule’s broader 
role in inflammatory conditions and disease 
states involving mitochondrial ROS. Prevent- 
ing succinate accumulation could bring ben- 
efits by limiting inflammation in conditions 
such as sepsis or rheumatoid arthritis, and 
may provide a new approach for limiting the 
damage caused by heart attack or stroke. Ulti- 
mately, the targeting of the events described 
here could result in much-needed therapies 
for patients for whom there are currently 
limited options. m 
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Agriculture and the 
global carbon cycle 


Evolving agricultural practices dramatically increased crop production in the 
twentieth century. Two studies now find that this has altered the seasonal flux of 
atmospheric carbon dioxide. SEE LETTERS P.394 & P.398 


NATASHA MACBEAN & PHILIPPE PEYLIN 


he concentration of carbon dioxide in 

the atmosphere undergoes seasonal, 

cyclic variation, the amplitude of which 
has increased by up to 50% in the Northern 
Hemisphere over the past 50 years'”. Several 
factors have been proposed to explain this 
increase*°, including the response of the 
terrestrial biosphere to climate change, 
increased fossil-fuel emissions, and changes 
in oceanic fluxes and atmospheric transport 
of CO,, but the relative magnitude and latitu- 
dinal contribution of each are still debated. In 
two studies published in this issue, Gray et al.° 
(page 398) and Zeng et al.’ (page 394) reveal 
that intensification of agriculture has contrib- 
uted substantially to this trend. 

The atmospheric CO, concentration has 
increased at an unprecedented rate during 
the past few decades. We know from a global 
network of atmospheric CO, measurements 
that roughly only half of the emissions associ- 
ated with fossil-fuel use and land-use change 
remain in the atmosphere®. The ocean and 
land surface must therefore act as a global car- 
bon sink, although its magnitude and location 
— and the mechanisms driving it — remain 
uncertain because of the difficulty of measur- 
ing and modelling carbon stocks and fluxes at 


large scales. Improving our knowledge of the 
driving mechanisms is essential for accurate 
projections of the global carbon budget under 
future climate and land-use changes. 

Atmospheric CO, data can provide an 
integrated, albeit indirect, measure of the 
global carbon budget, and so it is crucial to 
understand the causes of spatiotemporal vari- 
ability in these data. Much focus has been put 
on the growth rate of the annual mean CO, 
concentration and its year-to-year variability. 
By contrast, less attention has been paid to 
the observed increase in the amplitude of the 
seasonal CO, cycle in the extratropics of the 
Northern Hemisphere (regions at latitudes of 
30° to 90° N), which results from higher car- 
bon uptake in the summer and greater release 
in the winter. 

Agricultural productivity has previously 
been proposed as a possible cause’. Crops can 
have a stronger impact on carbon uptake than 
can natural vegetation, because of their high 
productivity. The widespread use of fertilizers, 
irrigation and high-yield crop cultivars has led 
toa threefold growth in global agricultural pro- 
duction in the past 50 years, with only a small 
expansion of cropland area’ (Fig. 1). Gray et al. 
and Zeng et al. are the first to demonstrate that 
agricultural productivity really has affected the 
amplitude of the annual CO, cycle. 


Figure 1 | Agricultural revolution. The expansion of irrigation infrastructure during the twentieth 
century helped to intensify crop production and improve yields. Two papers” report that this 
intensification has increased the amplitude of seasonal variations of atmospheric carbon dioxide levels in 
the Northern Hemisphere. 
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Gray and colleagues used a carbon- 
accounting method and crop-production 
statistics published by the Food and Agriculture 
Organization of the United Nations to calculate 
how much carbon was taken up by four major 
crop types — maize (corn), wheat, rice and 
soya beans (collectively called MWRS) — in 
the northern extratropics each year from 1961 
to 2008. They found that the annual exchange 
of carbon between crops and the atmosphere 
increased by 0.33 petagrams (1 petagram is 10° 
grams) during this period, mainly because of 
farming in northern China and the midwest- 
ern United States. The authors conclude that 
the rise in MWRS production is responsible 
for 17-25% of the increase in the seasonal car- 
bon flux required to explain observed changes 
in atmospheric CO, seasonality’, with maize 
alone accounting for 66% of this increase. 

Zeng and co-workers followed a more 
‘bottom-up’ approach, adapting a terrestrial 
biosphere model known as VEGAS to include 
a simple representation of changing agricul- 
tural management practices for a generic crop 
functional type (a single description that rep- 
resents an average of the growth character- 
istics of all crops). According to their study, 
enhanced agricultural productivity in the 
mid-latitudes contributes about 45% of the 
increasing amplitude of global net surface car- 
bon fluxes between 1961 and 2010, compared 
with 29% from climate change and 26% from 
CO, fertilization (increased photosynthesis 
caused by rising atmospheric CO, levels). 

Although both studies highlight the influ- 
ence of agricultural intensification, they cal- 
culate considerably different values for its 
contribution to the increasing amplitude. Why 
is this? Gray et al. focused on the change in 
productivity in the extratropics, where MWRS 
accounts for only 68% of dry biomass produc- 
tion from crops — which, as they point out, 
may lead to a substantial underestimate in 
their proposed contribution. Zeng and col- 
leagues, however, performed a global simula- 
tion with a generic crop model and assumed 
that crop growth is driven solely by favourable 
climate conditions. This may bias their results 
towards higher carbon uptake, because they 
do not account for winter wheat varieties that 
are commonly grown during the period of net 
carbon release. 

So is the contribution of agriculture to the 
increasing seasonal amplitude of atmospheric 
CO, closer to 20%, as Gray and co-workers 
estimate, or around 50%, in line with Zeng and 
colleagues’ result? The jury is still out. “Top- 
down data-driven approaches, such as those 
used by Gray et al., conceivably provide the best 
available crop-specific estimates. Process-based 
modelling frameworks are complementary; 
their strength lies in their potential to exam- 
ine the relative influence of all possible causal 
mechanisms, as undertaken by Zeng and co- 
workers. This requires the processes to be 
accurately represented, but current-generation 


terrestrial biosphere models vary in their sen- 
sitivity to temperature, precipitation and CO, 
fertilization®. Moreover, the effects of nutrient 
limitation, and of changes in the age distribu- 
tion and management of forests, are often miss- 
ing or inadequately represented in models*. All 
of these issues may affect simulations of the 
temporal dynamics of carbon fluxes. 

The terrestrial biosphere is thought to be the 
main driver of changes in atmospheric CO, 
seasonality in the Northern Hemisphere’”. 
However, we have not yet clearly differentiated 
between the many contributory effects, such 
as increased growing-season length’”® and 
changing rates of respiration’' due to warmer 
temperatures; enhanced plant growth caused 
by climate change, CO, fertilization and/or the 
deposition of nitrogen compounds from the 
atmosphere*” ; and human-induced distur- 
bance of the natural ecosystem, for example 
from fire or grazing’. The intensification of 
agricultural productivity must now join the list. 

Finally, an atmospheric-transport model 
that accounts for complex mixing processes 
is necessary to properly assess the different 
contributions to increased seasonality of 
atmospheric CO, concentrations and their 
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spatial distribution. Shifts in the seasonal 
variations of fossil-fuel emissions and ocean 
CO, fluxes may have been overlooked, and the 
influence of tropical regions, although less sea- 
sonal, should be considered in future studies. m 
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Leaf veins share 
the time of day 


Techniques for isolating and analysing leaf cell types have now been developed, 
leading to the discovery that circadian clocks in the plant vasculature 
communicate with and regulate clocks in neighbouring cells. SEE LETTER P.419 


MARIA C. MARTI & ALEX A. R. WEBB 


he flowering plants in our gardens and 

in the countryside provide us with a 

colourful landscape, and are often 
thought of as nothing more than a dormant 
backdrop to our lives. But beneath their attrac- 
tive exteriors, plants are capable of complex 
behaviour, such as measuring time. In this 
issue, Endo et al.’ (page 419) identify circadian 
clocks in leaf veins that signal to neighbour- 
ing cells — an indication that plant circadian 
clocks might be organized into a hierarchical 
system. 

Plant leaves are sophisticated organs 
comprised of several cell types, each with a 
different function. Epidermal cells line the 
leaf surface, with the bulk of the leaf being 
composed of mesophyll cells, which are 
responsible for photosynthesis. In addition, the 
leaves and stem are infiltrated by the veins of 
the plant vasculature, which transports water 
and molecules such as sugars around the plant. 
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Endo and colleagues developed a method for 
efficiently isolating epidermal, mesophyll and 
vasculature cells from Arabidopsis thaliana 
plants, allowing them to study spatiotemporal 
gene expression and circadian-clock regula- 
tion at high resolution. 

Multicellular organisms ensure that cells 
are performing the correct processes at the 
right time of day through their circadian 
clocks, which have a period of approximately 
24 hours, allowing anticipation of dawn 
and dusk. The timing of about 30% of gene 
activity in plants is modulated by circadian 
clocks. A clock’s core consists of around 
20 genes divided into two interlocking 
pathways — a morning loop of genes that are 
active during daylight hours and an evening 
loop active from dusk. 

The researchers observed that morning- 
loop genes such as CCA were more active 
in the mesophyll than in the vasculature, 
whereas the opposite was true of evening- 
loop genes such as TOC1. Furthermore, when 


Mesophyll 


Figure 1 | Time for a talk. a, Leaves are comprised of epidermal cells, 
mesophyll cells and the cells that make up the vasculature. b, Endo et al.' report 
differences in the circadian clocks that regulate the vasculature and mesophyll. 
In the vasculature, evening-loop genes such as TOC] are more active than 
morning-loop genes such as CCA1 (loops indicated by white arrows), and so 


they measured genome-wide gene activity, 
the authors found differential gene expression 
in each tissue. Output genes (those regulated 
by circadian clocks) that were more active in 
the mesophyll than in the vasculature tended 
to be expressed in the morning, whereas 
output genes more active in the vasculature 
were likely to be expressed in the evening. 
This suggests that differences in the circadian 
clock of each tissue cause differential gene 
expression (Fig. 1). 

Evidence of differences between the 
circadian clocks of leaves and roots” sug- 
gests that cell-type-specific clocks regulate 
specialized plant-cell functions. The activation 
of mesophyll-specific genes in the morning 
might reflect the need for photosynthesis to 
begin around dawn’. Enhanced evening-loop 
activity in the vasculature might be required to 
ensure accurate measurement of the timing of 
dusk, and therefore of day length — a measure- 
ment that controls flower production in many 
species’. Indeed, Endo and colleagues demon- 
strated that disruption of the circadian clock 
in the vasculature, but not in the mesophyll, 
epidermis, stem or root affected the timing of 
flower production in Arabidopsis. The vascu- 
lar clock might also regulate vascular-specific 
night-time activities, such as refilling vessels to 
remove air bubbles. 

A common feature of multicellular 
organisms is that the circadian clocks of 
neighbouring cells can communicate with 
each other, forming synchronized groups of 
cells that either create a robust oscillating sys- 
tem or convey information about time to distant 
organs. In mammals, for example, a coupled 
clock in the hypothalamus region of the brain 
regulates clocks in other tissues. In plants, weak 
communication between individual circadian 
clocks has been observed’, and it has been 
proposed that circadian clocks in the leaves 
are masters over those in the roots’. Endo 
et al. now provide experimental evidence for 
local coupling of clock systems in plants. They 
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stopped the vascular clock by overexpressing 
CCA1 in cells of the vasculature, and demon- 
strated that this also inhibited the clocks of the 
neighbouring mesophyll cells. This might be 
achieved through chemical signalling, perhaps 
involving sugars, because leaf-cell clocks are 
sensitive to changing sugar levels®. 

The communication between the circadian 
clocks in the vasculature and mesophyll might 
be hierarchical. Overexpression of CCA1 in the 
mesophyll had little effect on the vascular cir- 
cadian clock. Because the clocks of the two cell 
types are differentially enriched for morning 
and evening components, it will be interesting 
to determine whether signalling occurs from 
the vasculature if TOC] is overexpressed in a 
cell-type-specific manner. 

Is the plant vasculature an interconnected 
system, generating robust oscillations that 
regulate other cells, similar to the circadian 
pacemakers of mammalian brains? Or might 
it function as a pipeline that disseminates tim- 
ing signals, analogous to the circadian clocks 
of red blood cells’? The vasculature is certainly 
more than just sophisticated plumbing; it acts 
as a conduit for rapid electrical®, oxidative’ 
and ionic” signals, reminiscent of a nervous 
system. However, the analogies to mammalian 
systems break down under scrutiny. Plants do 
not require the rapid responses provided by a 
nervous system, because their movements — 
usually mediated by growth — are slower than 
those of animals. 

Endo and colleagues’ work will make it 
easier to study individual plant cell types. By 
optimizing protocols for dissection, sonica- 
tion and enzyme treatments that degrade the 
cell wall, they have considerably shortened the 
time required to isolate cells for RNA meas- 
urement. Furthermore, the researchers have 
developed imaging techniques for studying 
spatiotemporal gene regulation in plants. 
They engineered two halves of the lumines- 
cent protein luciferase such that one half was 
produced only in a specific cell type and the 
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there is greater overall gene activity in the vasculature in the evening than in the 
morning (represented by yellow arrows). The opposite is true in the mesophyll. 
The authors show that the vasculature clock communicates with and regulates 
the mesophyll clock, but they did not find evidence that the mesophyll could 
regulate the vasculature, suggestive of hierarchical control. 


other half only when the promoter that drives 
either CCAI or TOCI gene expression was 
active. Because both halves must be produced 
in a cell for luminescence to occur, light emis- 
sion can be used as a measure of the activity of 
a circadian-clock gene in a given cell type. This 
approach can be extended to other cell types 
and responses, such as stress and developmen- 
tal signals, simply by using different promoters 
to drive the two halves of luciferase. 

The ability to study individual leaf cell 
types in detail will surely lead to a deeper 
understanding of circadian regulation of gene 
activity, development and photosynthesis. 
The first steps will be to determine why leaf 
circadian clocks communicate, and which 
signalling pathways convey information 
about time. Such knowledge is sorely needed 
if the challenge of improving crops to feed the 
growing human population is to be met. m 
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A comparative encyclopedia of DNA 
elements in the mouse genome 


A list of authors and their affiliations appears at the end of the paper 


The laboratory mouse shares the majority of its protein-coding genes with humans, making it the premier model organism 
in biomedical research, yet the two mammals differ in significant ways. To gain greater insights into both shared and 
species-specific transcriptional and cellular regulatory programs in the mouse, the Mouse ENCODE Consortium has 
mapped transcription, DNase I hypersensitivity, transcription factor binding, chromatin modifications and replication 
domains throughout the mouse genome in diverse cell and tissue types. By comparing with the human genome, we not 
only confirm substantial conservation in the newly annotated potential functional sequences, but also find a large degree 
of divergence of sequences involved in transcriptional regulation, chromatin state and higher order chromatin organi- 
zation. Our results illuminate the wide range of evolutionary forces acting on genes and their regulatory regions, and 
provide a general resource for research into mammalian biology and mechanisms of human diseases. 


Despite the widespread use of mouse models in biomedical research’, 
the genetic and genomic differences between mice and humans remain 
to be fully characterized. At the sequence level, the two species have 
diverged substantially: approximately one half of human genomic DNA 
can be aligned to mouse genomic DNA, and only a small fraction (3- 
8%) is estimated to be under purifying selection across mammals’. At 
the cellular level, a systematic comparison is still lacking. Recent studies 
have revealed divergent DNA binding patterns for a limited number of 
transcription factors across multiple related mammals**, suggesting 
potentially wide-ranging differences in cellular functions and regula- 
tory mechanisms””®. To fully understand how DNA sequences con- 
tribute to the unique molecular and cellular traits in mouse, it is crucial 
to have a comprehensive catalogue of the genes and non-coding func- 
tional sequences in the mouse genome. 

Advances in DNA sequencing technologies have led to the develop- 
ment of RNA-seq (RNA sequencing), DNase-seq (DNase I hypersensitive 
sites sequencing), ChIP-seq (chromatin immunoprecipitation followed 
by DNA sequencing), and other methods that allow rapid and genome- 
wide analysis of transcription, replication, chromatin accessibility, chro- 
matin modifications and transcription factor binding in cells’’. Using 
these large-scale approaches, the ENCODE consortium has produced 
a catalogue of potential functional elements in the human genome”. 
Notably, 62% of the human genome is transcribed in one or more cell 
types’, and 20% of human DNA is associated with biochemical signa- 
tures typical of functional elements, including transcription factor bind- 
ing, chromatin modification and DNase hypersensitivity. The results 
support the notion that nucleotides outside the mammalian-conserved 
genomic regions could contribute to species-specific traits*'*"™*. 

We have applied the same high-throughput approaches to over 100 
mouse cell types and tissues’*, producing a coordinated group of data 
sets for annotating the mouse genome. Integrative analyses of these 
data sets uncovered widespread transcriptional activities, dynamic gene 
expression and chromatin modification patterns, abundant cis-regulatory 
elements, and remarkably stable chromosome domains in the mouse 
genome. The generation of these data sets also allowed an unprecedented 
level of comparison of genomic features of mouse and human. Described 
in the current manuscript and companion works, these comparisons 
revealed both conserved sequence features and widespread divergence 
in transcription and regulation. Some of the key findings are: 


e Although much conservation exists, the expression profiles of many 
mouse genes involved in distinct biological pathways show consider- 
able divergence from their human orthologues. 

e A large portion of the cis-regulatory landscape has diverged between 

mouse and human, although the magnitude of regulatory DNA diver- 

gence varies widely between different classes of elements active in 
different tissue contexts. 

Mouse and human transcription factor networks are substantially 

more conserved than cis-regulatory DNA. 

Species-specific candidate regulatory sequences are significantly 

enriched for particular classes of repetitive DNA elements. 

Chromatin state landscape in a cell lineage is relatively stable in both 

human and mouse. 

Chromatin domains, interrogated through genome-wide analysis of 

DNA replication timing, are developmentally stable and evolution- 

arily conserved. 


Overview of data production and initial processing 


To annotate potential functional sequences in the mouse genome, we 
used ChIP-seq, RNA-seq and DNase-seq to profile transcription factor 
binding, chromatin modification, transcriptome and chromatin acces- 
sibility in a collection of 123 mouse cell types and primary tissues (Fig. 1a, 
Supplementary Tables 1-3). Additionally, to interrogate large-scale chro- 
matin organization across different cell types, we also used a microarray- 
based technique to generate replication-timing profiles in 18 mouse tissues 
and cell types (Supplementary Table 3)'°. Altogether, we produced over 
1,000 data sets. The list of the data sets and all the supporting material 
for this manuscript are also available at website http://mouseencode.org. 
Below we briefly outline the experimental approach and initial data pro- 
cessing for each class of sequence features. 


RNA transcriptome 

To comprehensively identify the genic regions that produce transcripts 
in the mouse genome, we performed RNA-seq experiments in 69 differ- 
ent mouse tissues and cell types with two biological replicates each (Sup- 
plementary Table 3, Supplementary Information) and uncovered 436,410 
contigs (Supplementary Table 4). Confirming previous reports’*’”"* 
and similar to the human genome, the mouse genome is pervasively 
transcribed (Fig. 1b), with 46% capable of producing polyadenylated 
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Figure 1 | Overview of the mouse ENCODE data sets. a, A genome browser 
snapshot shows the primary data and annotated sequence features in the mouse 
CH12 cells (Methods). b, Chart shows that much of the human and mouse 
genomes is transcribed in one or more cell and tissue samples. c, A bar chart 
shows the percentages of the mouse genome annotated as various types of 
cis-regulatory elements (Methods). DHS, DNase hypersensitive sites; TF, 
transcription factor. d, Pie charts show the fraction of the entire genome that is 
covered by each of the seven states in the mouse embryonic stem cells (mESC) 


messenger RNAs (mRNA). By comparison, 39% of the human gen- 
ome is devoted to making mRNAs. In both species, the vast majority 
(87-93%) of exonic nucleotides were detected as transcribed, confirm- 
ing the sensitivity of the approach. However, a higher percentage of 
intronic sequences were detected as transcribed in the mouse, and this 
might be owing to a greater sequencing depth and broader spectrum of 
biological samples analysed in mouse (Fig. 1b). 


Candidate cis-regulatory sequences 

To identify potential cis-regulatory regions in the mouse genome, we 
used three complementary approaches that involved mapping of chro- 
matin accessibility, specific transcription factor occupancy sites and 
histone modification patterns. All of these approaches have previously 
been shown to uncover cis regulatory elements with high accuracy and 
sensitivity’?”°. 

By mapping DNase I hypersensitive sites (DHSs) in 55 mouse cell and 
tissue types”, we identified a combined total of ~1.5 million distinct 
DHSsata false discovery rate (FDR) of 1% (Supplementary Table 5)”. 
Genomic footprinting analysis in a subset (25) of these cell types fur- 
ther delineated 8.9 million distinct transcription factor footprints. 
De novo derivation of a cis-regulatory lexicon from mouse transcrip- 
tion factor footprints revealed a recognition repertoire nearly identical 
with that of the human, including both known and novel recognition 
motifs”. 

We used ChIP-seq to determine the binding sites for a total of 37 
transcription factors in various subsets of 33 cell/tissue types. Of these 
37 transcription factors, 24 were also extensively mapped in the murine 
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and human erythroid cell models (MEL and K562) and B-lymphoid cell 
lines (CH12 and GM12878)”. In total we defined 2,107,950 discrete 
ChIP-seq peaks, representing differential cell/tissue occupancy patterns 
of 280,396 distinct transcription factor binding sites (Supplementary 
Methods and Supplementary Table 6). 

Wealso performed ChIP-seq for as many as nine histone H3 mod- 
ifications (H3K4mel, H3K4me2, H3K4me3, H3K9me3, H3K27ac, 
H3K27me3, H3K36me3, H3K79me2 and H3K79me3) in up to 23 mouse 
tissues and cell types per mark. We applied a supervised machine learn- 
ing technique, random-forest based enhancer prediction from chromatin 
state (RFECS), to three histone modifications (H3K4mel, H3K4me3 
and H3K27ac)™, identifying a total of 82,853 candidate promoters and 
291,200 candidate enhancers in the mouse genome (Supplementary 
Tables 7 and 8). To functionally validate the predictions, we randomly 
selected 76 candidate promoter elements (average size 1,000 bp, Sup- 
plementary Table 9) and 183 candidate enhancer elements (average 
size 1,000 bp, Supplementary Table 10). For candidate promoter elements, 
we cloned these previously unannotated sequences into reporter con- 
structs, and performed luciferase reporter assays via transient transfec- 
tion in pertinent mouse cell lines . For candidate enhancer elements, we 
performed functional validation assay using a high throughput method 
(see Supplementary Methods). Overall, 66/76 (87%) candidate promo- 
ters and 129/183 (70.5%) candidate enhancers showed significant activity 
in these assays, compared to 2/30 randomly selected negative controls 
(Supplementary Fig. 1c). 

Collectively, our studies assigned potential regulatory function to 12.6% 
of the mouse genome (Fig. 1c). 
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Transcription factor networks 


Weexplored the transcription factor networks and combinatorial tran- 
scription factor binding patterns in the mouse samples in two companion 
papers, and compared these networks to regulatory circuitry models 
generated for the human genome”. From genomic footprints, we con- 
structed transcription-factor-to-transcription-factor cross-regulatory 
network in each of 25 cell/tissue types for a total of ~500 transcription 
factors with known recognition sequences. Analyses of these networks 
revealed regulatory relationships between transcription factor genes that 
are strongly preserved in human and mouse, in spite of the extensive 
plasticity of the cis-regulatory landscape (detailed below). Whereas only 
22% of transcription factor footprints are conserved, nearly 50% of cross- 
regulatory connections between mouse transcription factors are con- 
served in human through the innovation of novel binding sites. Moreover, 
analysis of network motifs shows that larger-scale architectural features of 
mouse and human transcription factor networks are strikingly similar”. 


Chromatin states 


We produced integrative maps of chromatin states in 15 mouse tissue 
and cell types and six human cell lines (Supplementary Table 11), using 
a hidden Markov model (chromHMM)”°”’ that allowed us to segment 
the genome in each cell type into seven distinct combination of chro- 
matin modification marks (or chromatin states). One state is charac- 
terized by the absence of any chromatin marks, while every other state 
features either predominantly one modification or a combination of two 
modifications (Extended Data Table 1, Supplementary Information). 
The portion of the genome in each chromatin state varied with cell type 
(Fig. 1d, Supplementary Fig. 2). Similar proportions of the genome are 
found in the active states in each cell type, for both mouse and human. 
Interestingly, excluding the ‘unmarked state, the fraction of each genome 
that is in the H3K27me3-dominated, transcriptionally repressed state is 
the most variable, suggesting a profound role of transcriptional repression 
in shaping the cis-regulatory landscape during mammalian development. 


Replication domains 

Replication-timing, the temporal order in which megabase-sized geno- 
mic regions replicate during S-phase, is linked to the spatial organization 
of chromatin in the nucleus***’, serving as a useful proxy for tracking 
differences in genome architecture between cell types***’. Since different 
types of chromatin are assembled at different times during the S phase™, 
changes in replication timing during differentiation could elicit changes 
in chromatin structure across large domains. We obtained 36 mouse 
and 31 human replication-timing profiles covering 11 and 9 distinct 
stages of development, respectively (Supplementary Table 12). We defined 
‘replication boundaries’ as the sites where replication profiles change 
slope from synchronously replicating segments (discussed later). A total 
of 64,535 and 50,194 boundaries identified across all mouse and human 
data sets, respectively, were mapped to 4,322 and 4,675 positions, with 
each cell type displaying replication-timing transitions at 50-80% of 
these positions (Fig. le). 


Annotation of orthologous coding and non-coding genes 


To facilitate a systematic comparison of the transcriptome, cis-regulatory 
elements and chromatin landscape between the human and mouse 
genomes, we built a high-quality set of human-mouse orthologues of 
protein coding and non-coding genes”’. The list of protein-coding orth- 
ologues, based on phylogenetic reconstruction, contains a total of 15,736 
one-to-one and a smaller set of one-to-many and many-to-many ortho- 
logue pairs (Supplementary Tables 13-15). We also inferred ortholo- 
gous relationships among short non-coding RNA genes using a similar 
phylogenetic approach. We established one-to-one human-mouse orth- 
ologues for 151,257 internal exon pairs (Supplementary Table 16) and 
204,887 intron pairs (Supplementary Table 17), and predicted 2,717 (3,446) 
novel human (respectively, mouse) exons (Supplementary Table 18). 
Additionally, we mapped the 17,547 human long non-coding RNA 
(IncRNA) transcripts annotated in Gencode v10 onto the mouse genome. 
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We found 2,327 (13.26%) human IncRNA transcripts (corresponding 
to 1,679, or 15.48%, of the ncRNA genes) homologous to 5,067 putative 
mouse transcripts (corresponding to 3,887 putative genes) (Supplementary 
Fig. 3, Supplementary Table 19). Consistent with previous observations, 
only a small fraction of ncRNAs are constrained at the primary sequence 
level, with rapid evolutionary turnover**. Other comparisons of human 
and mouse transcriptomes, covering areas including pre-mRNA splic- 
ing, antisense and intergenic RNA transcription, are detailed in an asso- 
ciated paper”. 


Divergent and conserved gene expression patterns 


Previous studies have revealed remarkable examples of species-specific 
gene expression patterns that underlie phenotypic changes during 
evolution’* ”. In these cases changes in expression ofa single gene between 
closely related species led to adaptive changes. However, it is not clear how 
extensive the changes in expression patterns are between more distantly 
related species, such as mouse and human, with some studies emphasiz- 
ing similarities in transcriptome patterns of orthologous tissues“ and 
others emphasizing substantial interspecies differences**. Our initial 
analyses revealed that gene expression patterns tended to cluster more 
by species rather than by tissue (Fig. 2a). To resolve the sets of genes 
contributing to different components in the clustering, we employed 
variance decomposition (see Methods) to estimate, for each orthologous 
human-mouse gene pair, the proportion of the variance in expression 
that is contributed by tissue and by species (Fig. 2b). This analysis revealed 
the sets of genes whose expression varies more across tissues than between 
species, and those whose expression varies more between species than 
across tissues. As expected, the clustering of the RNA-seq samples is dom- 
inated either by species or tissues, depending on the gene set employed 
(Extended Data Fig. 1a, b). Furthermore, removal of the ~4,800 genes 
that drive the species-specific clustering (see ref. 47, Supplementary Fig. 1d 
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Figure 2 | Comparative analysis of the gene expression programs in human 
and mouse samples. a, Principal component analysis (PCA) was performed 
for RNA-seq data for 10 human and mouse matching tissues. The expression 
values are normalized across the entire data set. Solid squares denote human 
tissues. Open squares denote mouse tissues. Each category of tissue is 
represented by a different colour. b, Gene expression variance decomposition 
(see Methods) estimates the relative contribution of tissue and species to the 
observed variance in gene expression for each orthologous human-mouse gene 
pair. Green dots indicate genes with higher between-tissue contribution and 
red dots genes with higher between-species contributions. c, Neighbourhood 
analysis of conserved co-expression (NACC) in human and mouse samples. 
The distribution of NACC scores for each gene is shown. d, A scatter plot shows 
the average of NACC score over the set of genes in each functional gene 
ontology category. Highlighted are those biological processes that tend to be 
more conserved between human and mouse and those processes that have been 
less conserved (see Supplementary Table 21 for list of genes). 
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therein) or normalization methods that reduce the species effects reveal 
tissue-specific patterns of expression in the same samples (Extended Data 
Fig. 1c). Categorizing orthologous gene pairs into these groups should 
enable more informative translation of research results between mouse 
and human. In particular, for gene pairs whose variance in expression 
is largest between tissues (and less between species), mouse should be a 
particularly informative model for human biology. In contrast, inter- 
pretation of studies involving genes whose variance in expression is larger 
between species needs to take into account the species variation. The 
relative contributions of species-specific and tissue-specific factors to 
each gene’s expression are further explored in two associated papers*””’. 

To further identify genes with conserved expression patterns and 
those that have diverged between humans and mice, we developed a 
novel method, referred to as neighbourhood analysis of conserved co- 
expression (NACC), to compare the transcriptional programs of orth- 
ologous genes in a way that did not require precisely matched cell lines, 
tissues or developmental stages, as long as a sufficiently diverse panel of 
samples is used in each species (Supplementary Methods). Observing 
that the orthologues of most sets of co-expressed genes in one species 
remained significantly correlated across samples in the other species, 
we use the mean of these small correlated sets of orthologous genes as a 
reference expression pattern in the other species. We compute Euclidean 
distance to the reference pattern in the multi-dimensional tissue/gene 
expression space as a relative measure of conservation of expression of 
each gene. Specifically, for each human gene (the test gene), we defined 
the most similarly expressed set of genes (n = 20) across all the human 
samples as that gene’s co-expression neighbourhood. We then quantify 
the average distance between the transcript levels of the mouse ortho- 
logue of the test gene and the transcript levels of each mouse orthologue 
of the neighbourhood genes across the mouse samples. We then invert 
the analysis, and choose a mouse test gene and define a similar gene co- 
expression neighbourhood in the mouse samples, and calculate the 
average distance between the expression of orthologues of the test gene 
and expression of neighbourhood genes across the human samples. The 
average change in the human-to-mouse and mouse-to-human distances, 
referred herein as a NACC score, is a symmetric measure of the degree 
of conservation of co-expression for each gene. The distribution of this 
quantity for each gene is shown in Fig. 2c, showing that genes in one 
species show a strong tendency to be co-expressed with orthologues of 
similarly expressed genes in the other species compared to random genes 
(also see Supplementary Information). We quantify the degree to which 
a specific biological process diverges between human and mouse as the 
average NACC scores of genes in each gene ontology category by cal- 
culating a z-score using random sampling of equal size sets of genes. 
Figure 2d shows that genes coding for proteins in the nuclear and intra- 
cellular organelle compartments, and involved in RNA processing, nucleic 
acid metabolic processes, chromatin organization and other intracel- 
lular metabolic processes, tend to exhibit more similar gene expression 
patterns between human and mouse. On the other hand, genes involved 
in extracellular matrix, cellular adhesion, signalling receptors, immune 
responses and other cell-membrane-related processes are more diverged 
(for a complete list of all GO categories and conservation analysis, see 
Supplementary Table 21). As a control, when we applied the NACC 
analysis to two different replicates of RNA-seq data sets from the same 
species, no difference in biological processes can be detected (Supplemen- 
tary Fig. 5). 

Several lines of evidence indicate that NACC is a sensitive and 
robust method to detect conserved as well as diverged gene expression 
patterns from a panel of imperfectly matched tissue samples. First, when 
we applied NACC to a set of simulated data sets, we found that NACC 
is robust for the diversity and conservation of the mouse-human sample 
panel (in Supplementary Fig. 6). Second, we randomly sampled sub- 
sets of the full panel of samples and demonstrated that the categories of 
human-mouse divergence shown in Fig. 2d are robust to the particu- 
lar sets of samples we selected (Supplementary Fig. 7). Third, when we 
repeated NACC ona limited collection of more closely matched tissues 
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and primary cell types (see Supplementary Methods), the biological pro- 
cesses detected as conserved and species-specific in the larger panel of 
mismatched human—mouse samples are largely recapitulated, although 
some pathways are detected with somewhat less significance, probably 
owing to the smaller number of data sets used (Supplementary Fig. 8). 
In summary, the NACC results support and extend the principal com- 
ponent analysis, showing that while large differences between mouse 
and human transcriptome profiles can be observed (revealed in PC1), 
genes involved in distinct cellular pathways or functional groups exhibit 
different degrees of conservation of expression patterns between human 
and mouse, with some strongly preserved and others changing markedly. 


Prevalent species-specific regulatory sequences along 
with a core of conserved regulatory sequences 


To better understand how divergence of cis-regulatory sequences is 
linked to the range of conservation patterns detected in comparisons 
of gene expression programs between species, we examined evolutionary 
patterns in our predicted regulatory sequences. Previous studies have 
identified a wide range of evolutionary patterns and rates for cis-regulatory 
regions in mammals”*, but there are still questions regarding the over- 
all degree of similarity and divergence between the cis-regulatory land- 
scapes in the mouse and human. The variety of assays and breadth of 
tissue and cell-type coverage in the mouse ENCODE data therefore pro- 
vide an opportunity to address this problem more comprehensively. 
We first determined sequence homology of the predicted cis-elements 
in the mouse and human genomes. We established one-to-one and one- 
to-many mapping of human and mouse bases derived from reciprocal 
chained blastz alignments** and identified conserved cis-regulatory 
sequences”. This analysis showed that 79.3% of chromatin-based enhan- 
cer predictions, 79.6% of chromatin-based promoter predictions, 67.1% 
of the DHS, and 66.7% of the transcription factor binding sites in the 
mouse genome have homologues in the human genome with at least 
10% overlapping nucleotides, while by random chance one expects 51.2%, 
52.3%, 44.3% and 39.3%, respectively (Fig. 3a, Supplementary Information 
for details). With a more stringent cutoff that requires 50% alignment 
of nucleotides, we found that 56.4% of the enhancer predictions, 62.4% 
of promoter predictions, 61.5% of DHS, and 53.3% of the transcrip- 
tion factor binding sites have homologues, compared with an expected 
frequency of 34%, 33.8%, 33.6% and 33.7% by random chance (Sup- 
plementary Fig. 9). The candidate mouse regulatory regions with human 
homologues are listed in Supplementary Tables 22-25. Thus, between 
half and two-thirds of candidate regulatory regions demonstrate a sig- 
nificant enrichment in sequence conservation between human and mouse. 
The remaining half to one-third have no identifiable orthologous sequence. 
The candidate regulatory regions in mouse with no orthologue in 
human could arise either because they were generated by lineage-specific 
events, such as transposition, or because the orthologue in the other spe- 
cies was lost. Species-specific cis-regulatory sequences have been reported 
before*”’, but the fraction of regulatory sequences in this category remains 
debatable and may vary with different roles in regulation. We find that 
15% (12,387 out of 82,853) of candidate mouse promoters and 16.6% 
(48,245 out of 291,200) of candidate enhancers (both predicted by pat- 
terns of histone modifications) have no sequence orthologue in humans 
(Supplementary Tables 26, 28, for details please refer to Supplementary 
Methods section). However, the question remains as to whether these 
species-specific elements are truly functional elements or simply corre- 
spond to false-positive predictions due to measurement errors or bio- 
logical noise. Supporting the function of mouse-specific cis elements, 
18 out of 20 randomly selected candidate mouse-specific promoters tested 
positive using reporter assays in mouse embryonic stem cells, where they 
were initially identified (Fig. 3b, Supplementary Table 27). Further, when 
these 18 mouse-specific promoters were tested using reporter assays in 
the human embryonic stem cells, all of them also exhibited significant 
promoter activities (Extended Data Fig. 2a, Supplementary Table 27), 
indicating that the majority of candidate mouse-specific promoters are 
indeed functional sequences, which are either gained in the mouse lineage 
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Figure 3 | Comparative analysis of the cis-elements predicted in the human 
and mouse genome. a, Chart shows the fractions of the predicted mouse 
cis-regulatory elements with homologous sequences in the human genome 
(Methods). TFBS, transcription factor binding site. b, A bar chart shows the 
fraction of the DNA fragments tested positive in the reporter assays performed 
either using mouse embryonic stem cells (mESCs) or mouse embryonic 
fibroblasts (MEF). c, A chart shows the gene ontology (GO) categories enriched 
near the predicted mouse-specific enhancers. d, A bar chart shows the 
percentage of the predicted mouse-specific enhancers containing various 
subclasses of LTR and SINE elements. As control, the predicted mouse cis 
elements with homologous sequences in the human genome or random 
genomic regions are included. 


or lost in the human lineage. Similarly, a majority of the candidate mouse- 
specific enhancers discovered in embryonic stem cells are also likely bona 
fide cis elements, as 70.2% (26 out of 37) candidate enhancers randomly 
selected from this group were found to exhibit enhancer activities in 
reporter assays (Fig. 3b, Supplementary Table 29). Like the candidate 
mouse-specific promoters, 61.5% (16 out of 26) of the candidate mouse- 
specific enhancers also show enhancer activities in human embryonic 
stem cells (Extended Data Fig. 2a). 

We next tested whether the rapidly diverged cis-regulatory elements 
would correspond to the same cellular pathways shown to be less con- 
served by the NACC analysis of gene expression programs. Indeed, 
gene ontology analysis revealed that the mouse-specific regulatory ele- 
ments are significantly enriched near genes involved in immune func- 
tion (Fig. 3c), in agreement with the divergent transcription patterns 
for these genes reported earlier and a previous report based on a smaller 
number of primate-specific candidate regulatory regions”. This sug- 
gests that regulation of genes involved in immune function tends to be 
species-specific”, just as the protein-coding sequences coding for immu- 
nity, pheromones and other environmental genes are frequent targets 
for adaptive selection in each species**’. The target genes for mouse- 
specific transcription factor binding sites (Supplementary Table 30) are 
enriched in molecular functions such as histone acetyltransferase activity 
and high-density lipoprotein particle receptor activity, in addition to 
immune function (IgG binding). 

We next investigated the mechanisms generating mouse-specific cis- 
regulatory sequences: loss in human, gain in mouse, or both. 89% (42,947 
out of 48,245) of mouse-specific enhancers and 85% (10,535 out of 
12,387) of mouse-specific promoters overlap with at least one class of 
repeat elements (compared to 78% by random chance). Confirming 
earlier reports” **, we found that mouse-specific candidate promoters 
and enhancers are significantly enriched for repetitive DNA sequences, 
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with several classes of repeat DNA highly represented (Fig. 3d and Ex- 
tended Data Fig. 2b). Furthermore, mouse-specific transcription factor 
binding sites are highly enriched in mobile elements such as short inter- 
spersed elements (SINEs) and long terminal repeats (LTRs)*. 

The 50% to 60% of candidate regulatory regions with sequences 
conserved between mouse and human are a mixture of (1) sequences 
whose function has been preserved via strong constraint since these 
species diverged, (2) sequences that have been co-opted (or exapted) 
to perform different functions in the other species, and (3) sequences 
whose orthologue in the other species no longer has a discernable func- 
tion, but divergence by evolutionary drift has not been sufficient to pre- 
vent sequence alignment between mouse and human. Several companion 
papers delve deeply into these issues””*. In particular, ref. 23 shows that 
the conservation of transcription factor binding at orthologous positions 
(falling in category (1)) is associated with pleiotropic roles of enhancers, 
as evidenced by activity in multiple tissues. References 22,49 describe 
the exaptation of conserved regulatory sequences for other functions. 

We surveyed the conservation of function in the subset of mouse 
candidate cis elements that have sequence counterparts in the human 
genome. Of the 51,661 chromatin-based promoter predictions that have 
human orthologues, 44% (22,655) of them are still predicted as promo- 
ters in human on the basis of the same analysis of histone modifications 
(Supplementary Table 31, see Supplementary Methods for details). Of 
the 164,428 chromatin-based enhancer predictions that have human 
orthologues, 40% (64,962) of them are predicted as an enhancer in 
human (Supplementary Table 32). The remaining 56-60% of candidate 
mouse regulatory regions with a human orthologue fall into category 
(2) or (3) (see earlier), that is, the orthologous sequence in human either 
performs a different function or does not maintain a detectable function. 

One caveat of the above observation is that the tissues or cell sam- 
ples used in the survey were not perfectly matched. To better examine 
the conservation of biochemical activities among these predicted cis- 
regulatory elements with orthologues between mouse and human, we 
analysed the chromatin modifications at the promoter or enhancer 
predictions in a broad set of 23 mouse tissue and cell types with the 
neighbourhood co-expression association analysis (NACC) method 
described above. Instead of gene expression levels, we selected the his- 
tone modification H3K27ac as an indicator of promoter or enhancer 
activity as previously reported**. As shown in Fig. 4a, the promoter pre- 
dictions (blue) show a significantly higher correlation in the level of 
H3K27ac in human and mouse than the random controls (red). Simi- 
larly, most chromatin-based enhancer predictions in the mouse genome 
exhibit conserved chromatin modification patterns in the human, albeit 
toa lesser degree than the promoters (Fig. 4b). NACC analysis on DNase- 
seq signal resulted in very similar distributions of conserved chromatin 
accessibility patterns at promoters (Fig. 4c) and enhancers (Fig. 4d). Thus 
many sequence-conserved candidate cis-regulatory elements appeared 
to have conserved patterns of activities in mice and humans. 

Taken together, these analyses show that the mammalian cis-regulatory 
landscapes in the human and mouse genomes are substantially different, 
driven primarily by gain or loss of sequence elements during evolution. 
These species-specific candidate regulatory elements are enriched near 
genes involved in stress response, immunity and certain metabolic pro- 
cesses, and contain elevated levels of repeated DNA elements. On the 
other hand, a core set of candidate regulatory sequences are conserved 
and display similar activity profiles in humans and mice. 


Chromatin state landscape reflects tissue and cell identities 
We examined gene-centred chromatin state maps in the mouse and 
human cell types (see Supplementary Methods) (Fig. 5a, Supplementary 
Fig. 10). In all cell types, the low-expressed genes were almost uniformly 
in chromatin states with the repressive H3K27me3 mark or in the state 
unmarked by these histone modifications. In contrast, expressed genes 
showed the canonical pattern of H3K4me3 at the transcription start site 
surrounded by H3K4mel1, followed by H3K36me3-dominated states in 
the remainder of the transcription unit. A similar pattern was seen for 
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Figure 4 | Analysis of conservation in biochemical activities at the predicted 
mouse cis-regulatory sequences with human orthologues. a, b, Histograms 
show the distribution of the NACC score for the chromatin modification 
H3K27ac signal at the predicted mouse promoters (a) or enhancers (b). 

c, d, Histograms show the distributions of NACC scores for DNase I signal at 
the promoter proximal (c) and distal (d) DNase I hypersensitive sites (DHS). 


all the active genes, regardless of the level of expression; the only excep- 
tion was a tendency for the H3K4me3 to spread further into the tran- 
scription unit for the most highly expressed genes. The same binary 
relationship between chromatin state maps and expression levels of genes 
was observed in mouse and human cell types (Supplementary Fig. 10). 

For both mouse and human cells, the majority of the genome was in 
the unmarked state in each cell type, consistent with previous obser- 
vations in Drosophila*’ and human cell lines'* (Supplementary Fig. 2). 
About 55% of the mouse genome was in an unmarked state in all the 
15 cell types examined, while 65% is unmarked in all six human cell 
types. For genes that were in the unmarked state in mouse, their ortho- 
logues in human also tended to be in the unmarked state, and vice versa, 
leading to a positive correlation for the amount of gene neighbourhoods 
in unmarked states (Supplementary Fig. 11). Strong correlations were 
also observed in profiles of other chromatin marks averaged over cell 
lines and tissues*’. The genes in the unmarked zones were depleted of 
transcribed nucleotides relative to the number expected based on frac- 
tion of the genome included, and the levels of the transcripts mapped 


there were lower than those seen in the active chromatin states (Sup- 
plementary Fig. 12). 

Previous studies revealed limited changes of the chromatin states 
in lineage-restricted cells as they undergo large-scale changes in gene 
expression during maturation**’. The chromatin state maps recapi- 
tulated this result, showing very similar patterns of chromatin modi- 
fication in a cell line model for proliferating erythroid progenitor cells 
(G1E) and in maturing erythroblasts (G1E-ER4 cells treated with oes- 
tradiol) across genes whose expression level changed significantly during 
maturation (Fig. 5b, Supplementary Fig. 10b). This limited change raised 
the possibility that the chromatin landscape, once established during 
lineage commitment, dictates a permissive (or restrictive) environment 
for the gene regulatory programs in each cell lineage®, and that the 
chromatin states may differ between cell lineages. We tested this by 
examining the chromatin state maps for genes that were differentially 
expressed between haematopoietic cell lineages (erythroblasts versus 
megakaryocytes), and we found marked differences between the two 
cell types (Fig. 5c and Supplementary Fig. 10b). Genes expressed at a 
higher level in megakaryocytes than in erythroblasts were all in active 
chromatin states in megakaryocytes, but many were in inactive chro- 
matin states in erythroblasts (Fig. 5c). In the converse situation, genes 
expressed at a higher level in erythroblasts than in megakaryocytes showed 
more inactive states in the cells in which they were repressed (Supplemen- 
tary Fig. 10b). These greater differences in chromatin states correlating 
with differential expression of genes between, but not within, cell line- 
ages support the model that chromatin states are established during the 
process of lineage commitment. The clustering of cell types together by 
lineage based on chromatin state maps (Supplementary Fig. 10c) also 
supports the model that the landscape of active and repressed chro- 
matin is established no later than lineage commitment, and that this 
landscape is a defining feature of each cell type. Greater differences in 
chromatin states correlating with differences in gene expression were 
also observed when comparing average chromatin profiles in human 
and mouse”. 


Mouse chromatin states inform interpretation of human 
disease-associated sequence variants 

To investigate whether the mouse chromatin states were informative 
on sequence variants linked to human diseases by genome-wide asso- 
ciation studies (GWAS), we combined the chromatin state segmenta- 
tions of the fifteen mouse samples into a refined segmentation, which 
we used to train a self-organizing map (SOM)* on four histone modi- 
fication ChIP-seq data sets (H3K4me3, H3K4mel, H3K36me3 and 
H3K27me3) for each mouse sample. We mapped 4,265 single nucleo- 
tide polymorphisms (SNPs) from the human GWAS studies uniquely 
onto the mouse genome and scored these SNPs onto the trained SOM 
to determine whether SNP subsets were enriched in specific areas of the 
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Figure 5 | Chromatin landscape is stable within individual cell lineages. 

a, Map displaying the distribution of chromatin states over the neighbourhoods 
of human-mouse one-to-one orthologue genes in CH12 cells. The gene 
neighbourhood intervals were sorted by the transcription level of each gene, 
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Figure 6 | Human GWAS hits when mapped onto mouse genome are 
associated with specific chromatin states. a, A self-organization map of 
histone modification H3K4mel1 shows association between kidney H3K4mel 
state and specific GWAS hits associated with urate levels (Methods). b, Liver- 
specific H3K36me3 unit shows enrichment in GWAS hits related to 
cholesterol, alcohol dependence and triglyceride levels. c, Brain-specific 
H3K27me3 high unit shows enrichment in GWAS SNPs associated with 
neurological disorders. d, Characterization of every unit with statistically 
significant GWAS enrichments in terms of highest histone modification signal 
in at least one sample. Units with no signal in top 100 map units for every 
histone modification are listed as none. RPKM, reads per kilobase per million 
reads mapped. 


map. As shown in Fig. 6a, the highest enriched H3K4mel unit in the 
kidney contains five GWAS hits (P value < 3.95 X 10° '*) on different 
chromosomes related to blood characteristics such as platelet counts 
(Fig. 6a, Extended Data Table 2a). Similarly, the second highest enriched 
unit in liver H3K36me3 contained six GWAS hits (P value < 7.54 
X 10~*1) related to cholesterol and alcohol dependence out of twelve in 
that unit (Fig. 6b, Extended Data Table 2b). In contrast, one of the highest 
units in brain H3K27me3 has five GWAS hits (P value < 4.93 X 10 °°) 
on different chromosomes associated with brain disorders/response 
to addictive substances (Fig. 6c, Extended Data Table 2c). This unit is 
different from the other examples in that it is enriched for H3K27me3 
signal in multiple tissues, with brain being the highest. 801 out of the 
1,350 units of the map showed statistical enrichment of SNPs of 0.05 
after Holm-Bonferroni correction for multiple hypothesis testing, 
55% of which (accounting for 1,750 GWAS hits) had signal for at least 
one histone mark that ranked within the top 100 units on the map 
(Fig. 6d). The best histone marks for enriched GWAS units were pri- 
marily H3K4mel (23%), H3K36me3 (18%) and H3K27me3 (12%), with 
H3K4me3 accounting for less than 2% of the remainder. Together these 
results suggest that the chromatin state maps can be used to identify 
potential sites for functional characterization in mouse for human GWAS 
hits. Indeed, ref. 23 shows that conserved DNA segments bound by 
orthologous transcription factors in human and mouse are enriched 
for trait-associated SNPs mapped by GWAS. 


Large-scale chromatin domains are developmentally 
stable and evolutionarily conserved 

We mapped the positions of early and late replication timing bound- 
aries in each of 36 mouse and 31 human profiles (Fig. 7a). Significantly 
clustered boundary positions (above the 95th percentile of re-sampled 
positions) were identified and peaks in boundary density were aligned 
between cell types using a common heuristic (Extended Data Fig. 3a, b, 
Supplementary Fig. 13). After alignment, consensus boundaries were 
further classified by orientation and amount of replication timing sepa- 
ration, resulting in a more stringent filtering of boundaries (Supplemen- 
tary Figs 14, 15). Overall, we found that 88% of boundary positions (versus 
20% expected for random alignment; Fisher exact test P< 2 x 10 **) 
aligned position and orientation between two or more cell types in both 
mouse and human (that is, 12% were cell-type-specific, Fig. 7b, Extended 
Data Fig. 3). Pair-wise comparisons of boundaries were consistent with 
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Figure 7 | Replication timing boundaries preserved among tissues are 
conserved in mice and humans. a, Depiction of a timing transition region 
(TTR) between the early and late replication domains. Early and late 
boundaries are defined as slope changes at either end of TTRs. b, Boundaries 
conserved between species for matched mouse and human cell types as a 
function of preservation among mouse cell types. c, Percentage of boundaries 
conserved between species (bar graph) and overall conservation of boundaries 
between comparable mouse and human cell types (CH12 versus GM06990, 
mESC versus hESC, mouse epiblast stem cells (mEpiSC) versus hESC) as a 
function of preservation among mouse cell types. d, A Venn diagram compares 
the replication timing boundaries identified in the mouse and human genome. 


developmental similarity between cell types (Supplementary Fig. 16). 
The earliest and latest replicating boundaries were most well preserved 
between cell types, while those of mid-S replicating boundaries were 
highly variable (Extended Data Fig. 3e, f). 

Interestingly, the greatest number of boundaries was detected in 
embryonic stem cells in both species, with significant reduction in bound- 
ary numbers during differentiation (Supplementary Fig. 16), consistent 
with consolidation of domains and by proxy large-scale chromatin orga- 
nization into larger ‘constant timing regions’ during differentiation”. 
Given that over half of the mouse and human genomes exhibit signifi- 
cant replication timing changes during development'*®, these obser- 
vations support the model that developmental plasticity in replication 
timing is derived from differential regulation of replication timing 
within constant timing regions whose boundaries are preserved during 
development. 

Although conservation of replication timing between mouse and 
human has been reported”, the conservation of replicating timing 
boundaries has not been examined. We converted boundary coordi- 
nates + 100 kb across boundary positions between species, revealing 
significant overlap (Fig. 7c, d; P< 2.2 X 10 '° by Fisher’s exact test 
relative to a randomized boundary list). The level of conservation of 
the positions of boundaries improved from a median of 27% for cell- 
type-specific boundaries to 70% for boundaries preserved in nine or 
more cell types (Fig. 7c), demonstrating that boundaries most highly 
preserved during development were the most conserved across spe- 
cies. This was consistent with results for transcription (Fig. 2), as well 
as the previous observation that suggests that an increased plasticity of 
replication timing during development is associated with increased plas- 
ticity of replication timing during evolution™. Together, these findings 
identify evolutionarily labile versus constrained domains of the mam- 
malian genome at the megabase scale. 

Given the link between replication and chromatin assembly, we com- 
pared replication timing and levels of other chromatin properties in 
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200-kb windows across the genome (Supplementary Fig. 17). Features 
associated with active enhancers (H3K4mel, H3K27ac, DNase I sensi- 
tivity) were more closely correlated to replication timing than features 
associated with active transcription (RNA polymerase II, H3K4me3, 
H3K36me3, H3K79mez2). By contrast, the correlation of replication 
timing to repressive features, such as H3K9me3, was poor and cell-type- 
specific, consistent with prior results. A more stringent comparison of 
differences in chromatin to differences in replication timing between 
cell types (Extended Data Fig. 3c, g, Supplementary Fig. 17) again revealed 
that marks of enhancers, including p300, H3K4mel and H3K27ac, and 
DNase I sensitivity were more strongly correlated to replication timing 
than marks of active transcription. 


Conclusion 


By comparing the transcriptional activities, chromatin accessibilities, 
transcription factor binding, chromatin landscapes and replication tim- 
ing throughout the mouse genome ina wide spectrum of tissues and cell 
types, we have made significant progress towards a comprehensive 
catalogue of potential functional elements in the mouse genome. The 
catalogue described in the current study should provide a valuable ref- 
erence to guide researchers to formulate new hypotheses and develop 
new mouse models, in the same way as the recent human ENCODE 
studies have impacted the research community”. 

We provide multiple lines of evidence that gene expression and their 
underlying regulatory programs have substantially diverged between the 
human and mouse lineages although a subset of core regulatory pro- 
grams are largely conserved. The divergence of regulatory programs 
between mouse and human is manifested not only in the gain or loss of 
cis-regulatory sequences in the mouse genome, but also in the lack of 
conservation in regulatory activities across different tissues and cell types. 
This finding is in line with previous observations of rapidly evolving 
transcription factor binding in mammals, flies and yeasts, and highlights 
the dynamic nature of gene regulatory programs in different species**”™. 
Furthermore, by comprehensively delineating the potential cis-regulatory 
elements we demonstrated that specific groups of genes and regulatory 
elements have undergone more rapid evolution than others. Of parti- 
cular interest is the finding that cis-regulatory sequences next to immune- 
system-related genes are more divergent. The finding of species-specific 
cis-elements near genes involved in immune function suggests rapid 
evolution of regulatory mechanisms related to the immune system. 
Indeed, previous studies have uncovered extensive differences in the 
immune systems among different mouse strains and between humans 
and mice®, ranging from relative makeup of the innate immune and 
adaptive immune cells®, to gene expression patterns in various immune 
cell types”, and transcriptional responses to acute inflammatory insults*®. 
At least some of these differences may be attributed to distinct regu- 
latory mechanisms”, and our finding that many predicted mouse cis 
elements near genes with immune function lack sequence conservation 
supports the model that evolution of cis-regulatory sequences contri- 
butes to differences in the immune systems between humans and mice. 
More generally, our findings are consistent with the view that changes 
in transcriptional regulatory sequences are a source for phenotypic dif- 
ferences in species evolution. 

How can species-specific gains or loss of cis-regulatory elements during 
evolution be compatible with their putative regulatory function? The 
finding of different rates of divergence associated with regulatory pro- 
grams of distinct biological pathways suggests complex forces driving 
the evolution of the cis-regulatory landscape in mammals. We discov- 
ered that specific classes of endogenous retroviral elements are enriched 
at the species-specific putative cis-regulatory elements, implicating trans- 
position of DNA as a potential mechanism leading to divergence of gene 
regulatory programs during evolution. Previous studies have shown that 
endogenous retroviral elements can be transcribed in a tissue-specific 
manner’*”!, with a fraction of them derived from enhancers and neces- 
sary for transcription of genes involved in pluripotency”*”*. Future studies 
will be necessary to determine whether retroviral elements at or near 
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enhancers are generally involved in driving tissue-specific gene expres- 
sion programs in different mammalian species. 

Despite the divergence of the regulatory landscape between mouse and 
human, the pattern of chromatin states (defined by histone modifications) 
and the large-scale chromatin domains are highly similar between the 
two species. Half of the genome is well conserved in replication timing 
(and by proxy, chromatin interaction compartment) with the other 
half highly plastic both between cell types and between species. It will 
be interesting to investigate the significance of these conserved and 
divergent classes of DNA elements at different scales, both with regard 
to the forces driving evolution and for implications of the use of the 
laboratory mouse as a model for human disease. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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The basic body plan and major physiological axes have been highly conserved during mammalian evolution, yet only a 
small fraction of the human genome sequence appears to be subject to evolutionary constraint. To quantify cis- versus 
trans-acting contributions to mammalian regulatory evolution, we performed genomic DNase I footprinting of the mouse 
genome across 25 cell and tissue types, collectively defining ~8.6 million transcription factor (TF) occupancy sites at nucle- 
otide resolution. Here we show that mouse TF footprints conjointly encode a regulatory lexicon that is ~95% similar with 
that derived from human TF footprints. However, only ~20% of mouse TF footprints have human orthologues. Despite 
substantial turnover of the cis-regulatory landscape, nearly half of all pairwise regulatory interactions connecting mouse 
TF genes have been maintained in orthologous human cell types through evolutionary innovation of TF recognition 
sequences. Furthermore, the higher-level organization of mouse TF-to-TF connections into cellular network architec- 
tures is nearly identical with human. Our results indicate that evolutionary selection on mammalian gene regulation is 


targeted chiefly at the level of trans-regulatory circuitry, enabling and potentiating cis-regulatory plasticity. 


Gene regulation is classically partitioned into cis- and trans-acting com- 
partments, which are in turn integrated to form a regulatory network. 
The cis compartment comprises DNA elements that encode TF recog- 
nition sites, while the trans compartment encompasses hundreds of 
TF genes and their DNA recognition repertoires. The cross-regulation 
of TF genes by one another creates a regulatory network that facilitates 
complex information processing and potentiates robustness at the cel- 
lular and higher levels’. 

In metazoan genomes, actuatable TF recognition sites are clustered 
into compact (~ 100-300 bp) regulatory DNA regions that give rise to 
DNase I hypersensitive sites (DHSs) upon TF occupancy in place of a 
canonical nucleosome’. Mice and humans diverged ~90 million years 
ago’, and an extensive survey of mouse DHSs indicates that the cis- 
regulatory DNA compartment has evolved markedly since the last com- 
mon ancestor’, generalizing and extending observations from selected 
TFs assayed by ChIP-seq in one or a few tissues”®. However, given the 
limited experimental resolution of previous studies, it is currently unknown 
how dynamic are individual in vivo TF recognition sites within broader 
regulatory regions, or more generally how cis-regulatory dynamics relate 
to the conservation of the higher-level cellular and physiological features 
that define mammals. Earlier studies of individual regulatory elements 
in Drosophila’ and zebrafish® indicate a potential for functional conser- 
vation without sequence conservation, and the maintenance of regula- 
tory activity with different phenotypic outcomes. However, the generality 
of these observations and their broader relevance for mammalian evo- 
lution is unclear. 

Genomic DNase I footprinting enables systematic delineation of 


TF-DNA interactions at nucleotide resolution and ona global scale”""', 


permitting: (1) the simultaneous interrogation of hundreds of DNA- 
binding TFs expressed in a given cell type in a single experiment; (2) de 
novo derivation of the cis-regulatory lexicon of an organism; and (3) 
systematic mapping of TF-to-TF cross-regulatory networks*””. 

To delineate an expansive set of specific mouse genomic sequence 
elements contacted by TFs in vivo, we performed genomic DNase I 
footprinting on 25 diverse mouse cell and tissue types (Extended Data 
Table 1). From an average of 323 million uniquely mapped DNase I 
cleavages per cell type, we identified an average of ~1 million high- 
confidence (false discovery rate (FDR) 1%'°') DNase I footprints (6 to 
40 base pairs (bp)), anda total of 8.6 million differentially occupied foot- 
prints (Fig. la and Extended Data Fig. 1a). DNase I footprints were highly 
reproducible (Extended Data Fig. 1b) and robust to intrinsic DNase I 
cleavage propensities (Extended Data Fig. 2a). 


Evolutionary turnover of TF footprints 


To study the evolution of TF occupancy patterns between mouse and 
human, we compared mouse DNase I footprint maps with those from 
41 diverse human cell types'*’” by using bi-directional pairwise align- 
ments of the mouse and human genomes’ to resolve mouse DNase I foot- 
prints to the human genome (Fig. 1b). In total, 65% of mouse TF footprint 
sequences could be localized within the human genome, comparable to 
the cross-alignment rate of entire ~150-bp DHSs* (Fig. 1c). However, 
whereas 35% of mouse DHSs have human orthologues that are also 
DNase I hypersensitive in at least one human cell type’, only 22% of 
mouse TF footprints have human sequence orthologues that are occu- 
pied in any of the human cell types assayed (Fig. 1c). This indicates that 
the individual DNA elements within DHSs that are directly contacted 
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Figure 1 | Footprinting the mouse genome and comparison with human 
footprints. a, Derivation of 8.6 million differentially occupied DNase I 
footprints from 25 mouse cell and tissue types. b, Per-nucleotide DNase I 
cleavage across three gene promoters in both mouse and human cell types; 


by TFs in vivo have undergone massive turnover since the last common 
ancestor of mouse and human. 


Conservation of TF recognition lexicon 

Although most mouse TFs have human orthologues, the collective con- 
sequences of divergence in DNA binding domains and lineage-specific 
expansion of certain TF families (for example, KRAB zinc fingers) for 
the genomic occupancy landscape is unknown. We thus next explored 
the evolutionary stability of the mammalian TF recognition repertoire 
encompassed within mouse and human TF footprints. At directly occu- 
pied recognition sites for a given TF, footprinting data closely recapitu- 
late TF ChIP-seq'°"' (Extended Data Fig. 3), and average per-nucleotide 
DNase I cleavage profiles mirror the morphology of the DNA-protein 
binding interface*''*. Examination of cleavage profiles at occupied sites 
for diverse TFs showed these to be nearly identical between mouse and 
human cell types (Fig. 2a and Extended Data Fig. 2b), suggesting that 
in vivo DNA recognition preferences for many TFs have experienced 
little change between mouse and human. 

To investigate comprehensively the divergence of mouse and human 
TF recognition repertoires, we performed de novo motif discovery on the 
8.6 million mouse TF footprints. In total, we defined 604 unique motif 
models collectively accounting for the large majority of footprints (Fig. 2b), 
of which 355 models (59%) matched those within motif databases and 
249 were novel (Extended Data Fig. 4a). Comparison of known and novel 
mouse-derived motif models to motif models derived de novo from 
8.4 million human DNase I footprints” revealed that >94% of the col- 
lective TF lexicon is conserved between mouse and humans (Fig. 2c). 
The human lineage has witnessed expansion of certain TF gene fam- 
ilies, notably zinc finger TFs"; our results indicate that the proportion 
of genomic DNA elements bound by lineage-specific TFs in vivo is com- 
paratively small. The fact that TF footprints in mouse and human contain 
highly similar effective in vivo recognition sequence repertoires indicates 
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shared TF occupancy sites are indicated by faded boxes. c, Percentage of mouse 
DNase I footprints with sequence aligning to the human genome but not 
occupied in any human cell type (grey) versus aligning footprints that are 
occupied in one or more human cell type (red). 


that regulatory divergence between mouse and humans has occurred 
chiefly at the level of individual TF-binding cis-regulatory elements. 
A total of 22 novel motif models were selective for the mouse line- 
age and 14 were selective for the human lineage (Fig. 2c). The 22 novel 
mouse-selective motifs are found chiefly in distal elements (Extended 
Data Fig. 4b), where they populate ~2% of DNase I footprints and show 
cell/tissue-specific occupancy, predominantly for mouse ES cells (Fig. 2d, e). 
This suggests that the TFs recognizing these elements may have impor- 
tant roles in very early development, when humans and rodents show 
more differences than at later stages’*, and further highlights the role of 
distal gene regulation in species divergence’®. Notably, whereas sequence 
matches to the 14 human-selective models in human DNase I footprints 
showed evidence of strong human-specific evolutionary constraint’®”” 
(Fig. 2f), nucleotide diversity at sequence matches to the 22 mouse- 
selective models in human DNase I footprints is compatible with signifi- 
cantly reduced human-specific evolutionary constraint (P < 0.05) (Fig. 2f), 
consistent with a loss of TF occupancy (and selective pressure) due to 
divergence (or loss) of the cognate factor within the human lineage. 


Conservation of TF-to-TF connections 


We next sought to characterize the core mouse TF regulatory network, 
and to compare its features with the human TF network. Genomic foot- 
printing provides a direct and empirical approach for mapping the core 
TF regulatory network of an organism comprising cross-regulatory inter- 
actions (network edges) between TF genes (network nodes). Footprint- 
anchored TF regulatory networks precisely recapitulate well-validated 
TF-to-TF regulatory connections’, and are agnostic to whether any 
given TF-to-TF regulatory interaction is positive (activating) or nega- 
tive (repressive), as these may vary conditionally even for a given TF. 
Following the approach of ref. 1, we mapped mouse TF-to-TF networks 
connecting the 586 mouse TF genes with known recognition sequences 
(Supplementary Information) within each of the 25 cell/tissue types 


©2014 Macmillan Publishers Limited. All rights reserved 


Figure 2 | Mouse TF footprints define a conserved cis-regulatory lexicon. 
a, Average per-nucleotide DNase I cleavage at occupied TF recognition sites 
within mouse and human DHSs. b, Of 604 motif models derived de novo from 
mouse footprints, 355 match curated databases. c, Comparison of 249 novel 
mouse motif models with models derived from human footprints. d, DNase I 
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Figure 3 | Evolutionary dynamics of cis-regulatory logic. a, Schematic 

for construction of cell-type regulatory networks using TF footprints: TF 
genes = network nodes; occupied TF motifs = directed network edges. b, TF 
genes regulated by OTX2 in fetal brain and retina networks. Symbols indicate 
known roles of target genes in brain versus retina development. c, Clustering 
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(Fig. 3a). This disclosed an average of 22,970 unique TF-to-TF edges 
per cell type, totalling 77,084 non-redundant edges across all 25 cell 
types. Differences between cell types derived from both the cell-selective 
usage of TFs, as well as the cell-selective occupancy patterns of these TFs. 
For example, the neuronal developmental regulator OTX2 is selective 
for neuronal tissue, but its connectivity/occupancy patterns differ between 
distinct neuronal cell/tissue types (Fig. 3b). 

Mouse TF regulatory networks from functionally similar cell and tissue 
types are coherently organized into anatomical and functional groups 
(Fig. 3c), analogous to results from human TF regulatory networks’. 
However, although the similarity (pairwise Jaccard indices) between all 
mouse and human networks was mostly maximal between orthologous 
mouse-human cell and tissue pairs (Fig. 3d, e), network differences within 
each species were smaller than differences between species (Fig. 3e). 

We next asked to what extent specific mouse TF-to-TF regulatory con- 
nections were conserved in human. We first identified TF-to-TF con- 
nections that were mouse-specific, human-specific or shared across both 
orthologous human and mouse cell types (Fig. 4a and Extended Data 
Table 2). We then differentiated shared regulatory edges (that is, pre- 
sent in both a mouse cell type and its human orthologue) arising from 
TF occupancy of an orthologous binding element from those shared 
edges arising from occupancy of non-orthologous sequence within 
regulatory DNA of the orthologous target gene (Fig. 4a). In the former 
case, both sequence and circuitry are conserved; in the latter, circuitry 
only. Overall, ~44% of the TF-to-TF regulatory connections are con- 
served between orthologous mouse and human cell types (P < 0.001) 
(Fig. 4b). However, >40% of these connections represent edges created 
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Figure 4 | Conservation of TF-to-TF regulatory circuitry. a, Four categories 
of regulatory interactions identified by comparative analysis of mouse and 
human TF networks. Functionally conserved connections can be mediated by 
TF occupancy at orthologous (red) or non-orthologous (blue) binding sites. 
b, Categorization and overall conservation of TF-to-TF connections between 
orthologous mouse and human cell types. On average 44% of TF-to-TF edges 
are conserved (P < 0.001; empirically calculated using shuffled networks). 
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by TF binding to a novel sequence element arising since mouse-human 
divergence (Fig. 4b). As such, conservation of functional regulatory cir- 
cuitry is considerably greater than indicated by sequence conservation 
alone. 


Comparative TF network architecture 


We next compared the overall architecture of mouse and human TF 
networks. The architecture of complex networks can be analysed in terms 
of simple regulatory circuit ‘building blocks’ termed network motifs, such 
as the feed-forward loop (FFL)’’. In human, despite the general selec- 
tivity of specific TF-to-TF edges for specific cell types, the pattern of 
utilization of three-node network motifs within each individual cell 
type network is nearly identical’. Computing network motif utilization 
within each of the 25 mouse TF networks also revealed uniform pat- 
terns across mouse cell/tissue type regulatory networks (Extended Data 
Fig. 5a). Strikingly, these patterns are nearly identical with human, indi- 
cating that mouse and human TF networks utilize virtually the same 
architecture (Fig. 5a and Extended Data Fig. 5). 

To analyse evolutionary conservation at the level of individual reg- 
ulatory circuits, we identified all instances of each three-node network 
motif within each mouse cell type, extracted the constituent TFs, and 
computed how the same TFs were connected in orthologous human cell 
types. Despite the conservation of overall network architecture between 
mouse and humans, this analysis revealed that the specific combinations 
of TFs comprising individual regulatory circuits have undergone sub- 
stantial remodelling between mouse and human (Fig. 5b and Extended 
Data Fig. 6). Overall, 39% of combinations of three TFs found within 
one or more three-node circuit in a given mouse cell type were also orga- 
nized into at least one type of three-node circuit in an orthologous 
human cell type (Extended Data Fig. 6b). For example, >25% of three- 
TF combinations organized into ‘regulating mutual’ circuits were con- 
served between orthologous mouse and human cell types, whereas only 
8% of three-TF combinations that form ‘mutual-and-three-chain’ cir- 
cuits show such conservation. By contrast, 12% of three-TF combinations 
that form ‘mutual-and-three-chain’ circuits lose one cross-regulatory 
interaction, transforming them into FFL circuits in orthologous human 
cell types (Fig. 5b and Extended Data Fig. 6c). Collectively, TF circuits 
conserved between mouse and human were enriched in four major net- 
work motif types: (1) the FFL motif; (2) the ‘regulated mutual’ motif; 
(3) the ‘regulating mutual’ (RM) motif; and (4) the ‘clique’ motif (Fig. 5b 
and Extended Data Fig. 6c). As such, these circuits appear to comprise 
the most vital building blocks of mammalian TF regulatory architectures. 


Conserved TF positions within networks 

We next asked to what degree the position ofa specific TF within a given 
network motif circuit was conserved between mouse and human. To 
analyse this, we focused on FFL and RM circuits, as these are both strongly 
conserved overall and have a clear top-down hierarchical organization 
(Fig. 5a, b). Computation of the propensity for each TF (of 586) to occupy 
each of the nodes within these network motifs revealed that the preferred 
position ofa given TF within FFL and RM circuits is strongly conserved 
between orthologous human and mouse cell types (Fig. 5c, d). It also 
revealed conserved preferential positioning of entire classes of TFs within 
particular network motif positions. For example, TFs with ubiquitous 
cellular functions such as CTCF, SP1 and NRF1 systematically localize 
within the driver positions of FFL and RM circuits (Fig. 5c, d), while TFs 
involved in cell lineage fate decisions (for example, SOX2, NFE2 and 
FOXP3) preferentially localized within the final passenger positions 
(Fig. 5c, d and Extended Data Fig. 7a, b). We also found the passenger 
edges of FFL and RM motifs to be significantly more cell-selective than 
the driver edges (Extended Data Fig. 7c, d). These findings raise the pos- 
sibility that one of the major functions of conserved mammalian network 
motifs may be to stabilize the expression of TFs that drive cell-type- 
specific regulatory programs via exploitation of stable cell-ubiquitous 
regulatory interactions. 
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Figure 5 | Conserved organizing principles of mammalian TF regulatory 
networks. a, Enrichment of three-node circuits in each mouse (red lines) and 
human (black lines) TF regulatory network (expanded in Extended Data Fig. 5). 
b, Left: frequency with which individual three-node circuits are identically 
maintained between the mouse and human T,,, network. Middle: percentage of 
specific three-node circuits identically maintained between the mouse and 


A conserved developmental program 

To explore how the TF regulatory network interacts with downstream 
non-TF structural/effector genes and to test for conserved interactions, 
we first quantified, for each TF, whether it preferentially regulates another 
TF gene(s) or a non-TF ‘structural’ gene(s) across different mouse and 
human cell types (Extended Data Fig. 8a). This parameter varied widely 
between different TFs; in general, TFs involved in development state 
specification such as HOXB1, OCT4 and SOX2 preferentially regulated 
other TF genes, while general transcriptional regulators such as NRF1, 
CTCF and SP1 preferentially regulated non-TF genes (Extended Data 
Fig. 8b, c). To test how these preferences varied by cell type, we aver- 
aged TF gene versus structural gene propensities for all TFs within each 
cell-type regulatory network. This revealed that the TF networks of plu- 
ripotent and early developmental cell types and tissues such as ES cells 
and fetal brain were globally significantly more oriented towards regu- 
lation of TF genes compared with the TF networks of more highly dif- 
ferentiated cell types (for example, B cells, T cells) and tissues (for example, 
adult brain) (Extended Data Fig. 8d). These TF versus structural gene 
preferences—both at the individual TF level and at the cell-type regula- 
tory network level—were strongly conserved between mouse and human 
(Extended Data Fig. 8d, e). The above findings suggest the operation of 
a conserved global developmental regulatory program that directs a shift 
in the orientation of TF regulatory networks from TF genes to structural 
genes during the transition from primitive to definitive cells. 

Taken together, our results expose several major organizing princi- 
ples of mammalian gene regulation, and a fundamental hierarchy in the 
modes of evolutionary transmission of regulatory information, ranging 
from poor conservation of cis-acting sequence elements to the preser- 
vation of trans-acting and network-level regulatory features (Fig. 6). 
Conservation of trans-acting components is reflected both in the effec- 
tive in vivo recognition repertoires of human and mouse TFs, which differ 
only slightly, and in the conserved patterns of TF-to-gene interactions. 
The dichotomy between cis- and trans-acting regulatory components is 
most apparent in the context of the core TF regulatory network. Whereas 
the individual DNA bases contacted by TFs in vivo have undergone 


human T,,.g network. Right: enrichment of three-node circuits in a network 
constructed using edges present in both mouse and human T,., networks. 

c, d, Frequency with which TFs from six functional classes occupy different 
positions (driver, first passenger, second passenger) within FFL (c) or RM (d) 
circuits in different mouse and human cell-type networks (hfBrain and hfHeart 
refer to human fetal brain and heart, respectively). 


extensive turnover since the last common ancestor of mouse and human, 
the repertoire of TFs regulating other TF genes is vastly more conserved. 
Notably, this cis-acting versus trans-acting disparity in mammals greatly 
eclipses that previously described for different Drosophila species”®. 

At the TF network level, organization of the regulatory circuitry in 
both mouse and human cell types appears to be governed by common 
principles that result in highly similar network architectures (Fig. 6). 
Conserved shifts in TF network orientation during the transition from 
primitive to definitive cells in both organisms suggest that the mam- 
malian regulatory network architecture has converged around a central 
goal of guiding cell identity during development. 

Collectively, our results indicate that evolutionary selection on gene 
regulation is targeted chiefly at the level of regulatory networks, and 
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Figure 6 | Hierarchy of evolutionary constraint on cis- versus trans- 
regulatory features. Shown are: overall proportion of conserved DNA 
bases between mouse and human’; proportion of orthologous TF footprints 
(from data shown in Fig. 1c); average proportion of individual conserved 
TF-to-TF regulatory connections across orthologous mouse and human cell 
types (from data shown in Fig. 4); and similarity in overall TF regulatory 
network architecture (from data shown in Figs 2 and 5). 
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explain how essential features of the mammalian body plan and phys- 
iology have been maintained in the face of massive turnover of the cis- 
regulatory landscape. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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To broaden our understanding of the evolution of gene regulation mechanisms, we generated occupancy profiles for 34 
orthologous transcription factors (TFs) in human-mouse erythroid progenitor, lymphoblast and embryonic stem-cell 
lines. By combining the genome-wide transcription factor occupancy repertoires, associated epigenetic signals, and 
co-association patterns, here we deduce several evolutionary principles of gene regulatory features operating since the 
mouse and human lineages diverged. The genomic distribution profiles, primary binding motifs, chromatin states, and 
DNA methylation preferences are well conserved for TF-occupied sequences. However, the extent to which orthologous 
DNA segments are bound by orthologous TFs varies both among TFs and with genomic location: binding at promoters is 
more highly conserved than binding at distal elements. Notably, occupancy-conserved TF-occupied sequences tend to 
be pleiotropic; they function in several tissues and also co-associate with many TFs. Single nucleotide variants at sites 
with potential regulatory functions are enriched in occupancy-conserved TF-occupied sequences. 


Determining the similarities and differences between mouse and human 
regulatory networks will not only improve our understanding of the evo- 
lution of regulatory mechanisms, but also help to interpret biomedical 
insights derived from research performed on mouse models. Recent 
genome-wide binding studies of eight TFs in several species uncovered 
many regulatory networks that have been highly rewired since the di- 
vergence of ancestors to mouse and human’, consistent with early studies 
in other species”. These results contrast sharply with other data showing 
that conservation of genomic DNA sequences can be a useful guide to 
discovery of regulatory regions’, and that the regulatory landscape can 
be highly conserved among more distant species’. Considering the large 
numbers of known TFs and their functional diversity, comprehensive 
studies on a broader range of TFs are needed to resolve these apparent 
discrepancies. Furthermore, our knowledge of the functional consequences 
of either divergence or conservation of TF occupancy remains limited. 


The mouse-human orthologous occupancy profiles 


To examine conservation of TF binding regions both between species 
and across different cell types, we generated and analysed a large data set 
of genome-wide binding profiles for 34 TFs in mouse and human. A 
diverse panel of TFs were chosen including those that bind DNA through 
specific consensus sequences, comprise part of the general transcrip- 
tional machinery such as RNA polymerase 2 (POL2), and modify or 
remodel chromatin (Extended Data Fig. 1a and Supplementary Infor- 
mation). For simplicity, we refer to the entire collection as TFs, even 
though some are general factors. We focused on occupancy by 32 TFs 
in cell line models for erythroid progenitors (mouse erythroleukaemia 
MEL and human leukaemia K562 cells) and lymphoblasts (mouse 


lymphoma CH12 and human B lymphoblastoid GM12878 cells) in mouse 
and human, and we also showed that the results are similar to those 
obtained in mouse and human embryonic stem cells (Extended Data 
Fig. 8). Chromatin immunoprecipitation with massively parallel sequen- 
cing (ChIP-seq) assays were conducted using replicate experiments and 
in accordance with ENCODE standards*. A total of 120 data sets were 
generated and analysed. 


Conserved and non-conserved features 


These genome-wide binding data for a large and diverse set of TFs 
revealed both conserved and non-conserved features of TF occupancy 
between mouse and human. First, although most TFs can reside at both 
promoters and distal sites, each shows a pronounced preference (Fig. 1a 
and Extended Data Fig. 2a, b). The preference is strongly conserved 
between mouse and human (R = 0.8; Extended Data Fig. 2c). The one 
exception is ETS1. Even though the primary motifin ETS1 is conserved 
between mouse and human (Fig. 1b), it preferentially binds proximal 
to promoters in human but not in mouse. ETS1 is responsible for the 
mouse-specific expression of the T-cell marker Thy-1 in the thymus’, 
and we propose that this marked difference in its binding location may 
contribute to immune system differences between mouse and human”. 
Second, although the primary motifs of most sequence-specific TFs are 
conserved between mouse and human, the secondary motifs (for exam- 
ple, motifs of associated factors; see Supplementary Information) tend 
to be lineage-specific (Fig. 1b and Extended Data Fig. 2d), indicating a 
change in co-associated partners. 

The preferred chromatin states, defined by histone modifications, 
for occupied sequences (OSs) of orthologous TFs are also conserved 
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Figure 1 | General features comparison between 
orthologous TF OSs. a, Each row represents one 
TF, and each column represents one genomic 
region. Heat-map colour shows the proportions of 
TF OSs (combination of different cell lines in the 
same species) that are located in each genomic 
region. b, Motif comparison for sequence-specific 
TFs examined in lymphoblast cells. In the right 
panel, each row represents one TF. The level of 
motif conservation is encoded by colour. Detailed 
results for the USF2 example are in the left panels. 
Peaks were divided into different bins according 
to the occupancy signal (higher signal on the left, 
lower on the right). The proportions of peaks with 
the motif in each bin (red lines) and the average 
distances between motif sites and peak summit 

in each bin (grey lines) are plotted against ranks of 
peak bins. Red dots indicate the proportion of 
control regions (+500 bp flanking the USF2 OS) 
that have the motif. NA, not available. c, TF OS 
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between mouse and human. Using data on five histone modifications, 
the mouse and human genomes were segmented into eight chromatin 
states (Fig. 1c and Extended Data Fig. 3a, b). Most TF OSs are located 
in states characteristic of promoters and enhancers (states 1-4). By con- 
trast, approximately 50% of OSs for the CTCF-cohesin complex (CTCF, 
RAD21 and SMC3)'"” are located in state 5 and 8, which mark qui- 
escent regions with very low signal for all the histone modifications. 
MAFK also shows preference for quiescent regions. Notably, both the 
CTCF-cohesin complex and MAFK® can mediate long-range inter- 
actions in the genome. The state preference is conserved between mouse 
and human (Fig. 1c; R = 0.9; Extended Data Fig. 3b), suggesting that 
the overall functions of the occupied segments are similar in the two 
species. Indeed, the proportion of enhancers, predicted by a different 
approach’*"*, is also conserved (R = 0.7) (Extended Data Fig. 4). 

Wealso examined DNA methylation profiles in TF OSs by using both 
methylated DNA immunoprecipitation (MeDIP) and DNA digestion 
with methyl-sensitive restriction enzymes followed by sequencing (MRE- 
seq)'®. The TF OSs are highly enriched for MRE-seq signals and depleted 
of MeDIP-seq signals, showing that TF OSs are generally hypomethy- 
lated in both species (Fig. 1d and Extended Data Fig. 3c). 


TF- and location-specific occupancy conservation 
The TF binding regions are enriched for conservation of DNA sequences, 
showing a strong signal for evolutionary constraint within +50 base pairs 
(bp) of ChIP-seq peak summits (Fig. 2a). This result indicates that pu- 
rifying selection has acted on DNA sequences in many of the TF OSs, 
but it does not mean that all TF OSs are uniformly under constraint. 
Approximately 50% of TF OSs do not align between mouse and human” 
because either they are lineage-specific sequences such as transposable 
elements”, or they have diverged to an extent that they no longer align. 
We then focused on the subset of TF OSs in which the sequences 
aligned between mouse and human to determine whether orthologous 
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chromatin state preference comparison between 
MEL and K562 cells. Heat map shows the 
percentage of TF OSs (rows) that overlap with eight 
different chromatin states (columns). d, The 
average signal distributions for MeDIP-seq and 
MRE-seq in MEL and K562 cells. Five-kilobase 
flanking regions centred on the TF OS peak 
summits were divided into 50-bp bins. Signals were 
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DNA sequences are also occupied by orthologous TFs (details in Sup- 
plementary Methods). Notably, the proportion of TF OSs at which 
occupancy was conserved varied markedly both among TFs and with 
the genomic locations (Fig. 2b). Conservation of occupancy is consis- 
tently higher in the promoter regions and lower in distal regions for 
almost all TFs, suggesting that the promoters may be under stronger 
selection than distal enhancers. Conserved promoter occupancy is ob- 
served both for factors that bind near promoters (NRF1 and MAZ) and 
for factors with a minority of binding sites in promoter regions (for ex- 
ample, MEF2A and TALI). A notable exception is the CTCF-cohesin 
complex, which not only shows high levels of occupancy conservation 
as described previously’, but also the conservation remains high at prox- 
imal, middle and distal regions relative to the transcription start site 
(TSS) (Fig. 2b). These patterns of variation in conservation of occu- 
pancy are robust. One potential confounding factor is the tendency for 
promoter sequences to be more conserved than other regulatory regions, 
but adjusting the occupancy conservation by the sequence conserva- 
tion difference revealed similar trends, that is, the OSs in promoter re- 
gions are more conserved than those in other regions (Extended Data 
Fig. 5a). Similarly, removal of the few TFs for which markedly different 
numbers of peaks were called between mouse and human did not change 
the patterns of conservation of occupancy (Extended Data Fig. 5b and 
Supplementary Information). 

Next, we investigated how epigenetic factors influence TF binding 
at orthologous sites between mouse and human. As expected, the dis- 
tribution of chromatin states is highly similar for occupancy-conserved 
TF OSs. For orthologues of TF OSs that can be aligned between the two 
species but are bound only in one species, a smaller proportion were in 
enhancer-associated states (states 3 and 4) anda larger proportion were 
in either repressed (state 7) or quiescent (states 5 and 8) chromatin OSs 
(Fig. 2c and Extended Data Fig. 6a, b). Thus species-specific loss of TF 
occupancy at many sites is accompanied by a shift to repressive or 
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quiescent chromatin. By contrast, the promoter states (states 1 and 2) 
were largely maintained in the second species even with the loss of TF 
binding. This result indicates that other TFs may help to maintain con- 
servation of a promoter state in these regions. We also searched for 
changes in the level of DNA methylation between TF OSs and their 
orthologous sequences. DNA methylation levels remained low in both 
species for occupancy-conserved TF OSs (Fig. 2d and Extended Data 
Fig. 6c), but the DNA methylation levels were significantly increased 
in the unbound, orthologous sequences. Thus, species-specific loss of 
TF occupancy is also associated with species-specific increases in DNA 
methylation. 


Occupancy conservation associates with pleiotropy 


We proposed that TF OSs with regulatory functions in several tissues 
would be under increased selective pressure, and thus more likely to 
be conserved in occupancy. To test this hypothesis, we first examined 
DNase I hypersensitive sites (DHSs) across 55 mouse tissues and cell 
lines’* to measure the chromatin accessibility of each TF OS among dif- 
ferent tissues. Because DHSs are a proxy for regulatory element activity”, 
TF OS regions accessible in multiple tissues are more likely to function 
in those tissues. Chromatin accessibility of TF OSs presents wide varia- 
tion, ranging from tissue-specific to ubiquitous patterns (Fig. 3a). Notably, 
the TF OSs with more pervasive chromatin accessibility across differ- 
ent tissues show the highest extent of occupancy conservation between 
mouse and human. The association between tissue usage and occupancy 
conservation is general; it was observed for most of the TFs examined 
(Extended Data Fig. 7b, c). This association is also robust to several po- 
tential confounding factors. CTCF-cohesin complexes, which are abun- 
dant and conserved across different tissue types and species'*”°, might 
be expected to bias the result; however, we obtained comparable results 
after removing all the genomic regions occupied by CTCF, RAD21 or 
SMC3 (Extended Data Fig. 7a). The conservation of promoter regions 
among several tissues and species’* might also be expected to bias our 
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analysis, but, after removal of occupancy-conserved TF OSs that lie 
within 2 kilobases (kb) of TSSs, we still found that the association be- 
tween tissue usage and TF occupancy conservation holds for distal TF 
OSs (Extended Data Fig. 7d, e). Furthermore, specifically examining 
distal TF OSs that overlapped with enhancers predicted by chromatin 
signals'* showed that broad tissue usage of presumptive enhancers tracks 
strongly with conservation of occupancy between mouse and human 
(Fig. 3b). 

A prediction of our hypothesis is that occupancy-conserved TF OSs 
will tend to be active in multiple tissues. To test this prediction experi- 
mentally, we randomly chose ten occupancy-conserved GATA1 OSs. 
Even though OSs were chosen on the basis of the occupancy profile of 
an erythroid-specific regulatory factor, all ten conserved OSs overlapped 
with DHSs peaks and predicted enhancers in many tissues, such as brain 
(Fig. 3c). When tested for in vivo enhancer activity in transgenic mouse 
reporter assays at embryonic day 11.5, nine of the ten showed strong, 
reproducible in vivo enhancer activity, and four were active in non- 
erythroid tissues such as midbrain and neural tube (Fig. 3c). We ex- 
panded our analysis to examine other mouse GATA1 OSs that overlapped 
with previously tested enhancers deposited in the VISTA Enhancer 
Browser (http://enhancer.Ibl.gov)”’. Six GATA1 OSs that are specific to 
mouse generated positive enhancer assays; only one (16%) showed ex- 
pression in tissues other than blood vessels and heart. By contrast, among 
12 additional occupancy-conserved GATA1 OSs with in vivo enhancer 
activity, 6 (50%) were active in non-erythroid tissues such as midbrain 
(Supplementary Table 5). 


Conservation and divergence of TFs co-association 

Because precise gene regulation requires complex interactions among 
different TFs, we speculated that differences in conservation of TF occu- 
pancy may be related, at least in part, to different co-association part- 
ners. By calculating the occupancy signals for all the TFs in each TF 
OS, we found that, in general, occupancy-conserved TF OSs tend to be 
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Figure 3 | Conservation of occupancy is 
associated with chromatin accessibility and 
enhancer activity in multiple tissues. 

a, Association between occupancy conservation 
and chromatin accessibility across several tissues. 
The density plot represents the frequency that 

TF OSs are in accessible chromatin in varying 
numbers of cell types. The x axis is the Shannon 
index density calculated on the basis of the DHS 
signals in 55 tissues or cell lines in mouse; high 
values mean the TF OS is in accessible chromatin in 
many cell types. The red line shows the fraction of 
TF OSs at which occupancy is conserved within 
each bin of Shannon index. b, Association between 
occupancy conservation and enhancer usage across 
several tissues. The density plot represents the 
frequency that TF OSs are in chromatin indicative 
of enhancer activity (calculated using histone H3 
acetyl Lys 27 (H3K27ac) ChIP-seq signals) in 
varying numbers of cell types. The x axis is the 
Shannon index calculated based on H3K27ac 
signal across 23 tissues or cell lines. The red line 
shows the fraction of TF OSs at which occupancy is 


conserved within each bin of Shannon index. 
pre-enhancer, presumptive enhancer. c, Results 
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bound by more TFs compared to lineage-specific TF OSs (P < 2.2 X 
10 '°, two-tailed t-test; Fig. 4a), suggesting that co-association with sev- 
eral TFs increases the level of purifying selection on the occupied se- 
quences. Furthermore, by examining each co-associated TF pair (Fig. 4b), 
we determined whether the co-associations were more enriched in 
occupancy-conserved versus species-specific binding sites (Fig. 4c and 
Extended Data Fig. 9). The relationships fell into three categories. In the 
first category, co-association of TFs is not linked with occupancy con- 
servation. For example, RAD21 is highly associated with CTCF in MEL 
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Figure 4 | TFs co-association and occupancy conservation. a, Density plot 
shows the distribution of co-associated TF numbers in each TF-binding region. 
The x axis represents the total number of occupied TFs per region. b, Pair-wise 
TF co-association in MEL cells. The colour intensity represents the extent of 
co-association between the TFs denoted in the rows and columns compared to 
the random expectation (details in Supplementary Methods). Red represents 
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of transgenic mouse enhancer assays of ten 
occupancy-conserved GATA1 binding sites. The 
I stained embryo images are highlighted by activity 
in different tissues: light pink for those showing 
enhancer activity only in heart and vascular tissues, 
darker pink for those with activities in other tissues. 
Right panel shows genes, enhancers predicted 

by histone modifications, chromatin states (using 
the software ChromHMM, see Methods), factor 
occupancy, and DHS signals across different 
tissues for regions containing two GATA1 OSs. 


cells; however, this co-association occurs with equivalent frequency at 
occupancy-conserved and species-specific binding sites. In the second 
category, TF co-association is negatively correlated with occupancy con- 
servation. For example, the co-association of MYC OSs with EP300, an 
enhancer-associated factor”, is highly enriched in the mouse-specific 
binding sites. In the last category, TF co-association is positively corre- 
lated with occupancy conservation, as exemplified by the co-association 
of MYC OSs with the co-repressor SIN3A (ref. 23), suggesting that MYC- 
associated repressors tend to be conserved between mouse and human. 
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co-association higher than random expectation, blue represents co-association 
lower than random expectation. c, Conditional TF OSs occupancy conservation 
in MEL cells. The colour intensity represents for a given TF (columns), 

whether the co-association with the other TF (rows) is more enriched in lineage- 
specific binding sites (green) or occupancy-conserved binding sites (red). The 
colour scale represents the extent (-log P value) of the enrichment significance. 
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Occupancy conservation and functional SNVs 


In a previous study, we assigned putative regulatory potential to gen- 
ome variations by combining high-throughput experimental data sets, 
computational predictions, and manual annotation”. Interestingly, even 
though conservation was not considered during the previous classifi- 
cations, we found that single nucleotide variants (SNVs) with high reg- 
ulatory potential were highly enriched in occupancy-conserved TF OSs 
(Extended Data Table 1a). Moreover, examination of the distribution 
of genome-wide association study (GWAS) single nucleotide polymor- 
phisms (SNPs) as a function of TF OS occupancy conservation revealed 
a significant enrichment of GWAS SNPs in occupancy-conserved TF 
OSs (P < 2.2 X 10~'®, Fisher’s exact test; see Supplementary Informa- 
tion) compared with the background distribution ofall genetic variation 
in the SNP database (dbSNP). When examining individual phenotypes, 
we found that SNPs associated with several phenotypes such as type I 
diabetes are significantly enriched in occupancy-conserved TF OSs (P = 
0.019, Fisher’s exact test; Extended Data Table 1b). However, SNPs as- 
sociated with other phenotypes, such as pulmonary function, are highly 
human-specific (P = 0.027, Fisher’s exact test; Extended Data Table 1b). 
Thus, although GWAS SNPs are generally enriched in occupancy- 
conserved TF OSs, this enrichment is phenotype-specific. 


Discussion 


Here we report that the conservation of TF occupancy associates with 
pleiotropic functions. This observation was further validated by in vivo 
enhancer assays in transgenic mice. To our knowledge, this is the first 
systematic investigation and validation of the relationship between pleio- 
tropic TF OSs and their occupancy conservation. The pleiotropic func- 
tions of a regulatory module subject it to several constraints that preserve 
the underlying motifs and occupancy patterns. However, the roles in 
different tissues need not be carried out by the same TF. Paralogous 
proteins that bind to the same DNA motif (for example, GATAS or 
GATA6) could be the active proteins in non-erythroid tissues at the 
GATAI OSs with conserved occupancy and pleiotropic functions. This 
prediction can be tested in future studies. 

Cell lines were used in this study because they provide an abundant 
source of almost identical cells, whereas obtaining primary cells in suf- 
ficient number for a study of this scale is problematic for many cell types. 
One concern is that cell lines across different species may not be entirely 
analogous. Although this possibility cannot be ruled out, when we com- 
pared the expression profile of the four cell lines with those of many 
other mouse tissues, we found that both MEL and K562, and also CH12 
and GM12878, were the most similar pairs (Supplementary Fig. 2a). This 
close similarity was also seen for genome-wide histone modification sig- 
natures (Supplementary Fig. 2b). Thus, we conclude that the K562 and 
MEL pair of cell lines and the GM12878 and CH12 cell-line pair are 
sufficiently similar for meaningful cross-species comparisons. Another 
concern is that the trends observed in cell lines may not be represent- 
ative of primary cells. Examination of binding of five TFs in mouse and 
human ES cells confirmed the preferential conservation of binding at 
promoters and the correlation of occupancy conservation with pleio- 
tropy of DHSs (Extended Data Fig. 8). Thus, the principles gleaned from 
our examination of many TFs in cell lines are likely to hold for TFs in 
primary cells. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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The power of relativistic jets is larger than the 
luminosity of their accretion disks 


G. Ghisellini', F. Tavecchio', L. Maraschi?, A. Celotti!?* & T. Sbarrato’>® 


Theoretical models for the production of relativistic jets from active 
galactic nuclei predict that jet power arises from the spin and mass 
of the central supermassive black hole, as well as from the magnetic 
field near the event horizon’. The physical mechanism underlying 
the contribution from the magnetic field is the torque exerted on the 
rotating black hole by the field amplified by the accreting material. 
If the squared magnetic field is proportional to the accretion rate, 
then there will be a correlation between jet power and accretion lumi- 
nosity. There is evidence for such a correlation” *, but inadequate 
knowledge of the accretion luminosity of the limited and inhomo- 
geneous samples used prevented a firm conclusion. Here we report 
an analysis of archival observations of a sample of blazars (quasars 
whose jets point towards Earth) that overcomes previous limitations. 
We find a clear correlation between jet power, as measured through 
the y-ray luminosity, and accretion luminosity, as measured by the 
broad emission lines, with the jet power dominating the disk lumi- 
nosity, in agreement with numerical simulations’. This implies that 
the magnetic field threading the black hole horizon reaches the max- 
imum value sustainable by the accreting matter’®. 

The jet power is predicted’ to depend on (aMB)’, where aand M are 
respectively the spin and mass of the black hole and B is the magnetic 
field at its horizon. Seed magnetic fields are amplified by the accretion 
disk up to equipartition with the mass energy density, ~ pc’ (c, speed of 
light; p, density), of the matter accreting at the rate M. A greater M 
implies a larger p, which can sustain a larger magnetic field. This field 
can in turn tap a larger amount of the black hole rotational energy. The 
magnetic field is thus a catalyst for the process. Increasing the spin of 
the black hole shrinks the innermost stable orbit, increasing the accre- 
tion efficiency 7 = Liisk f. MC? (Laiske accretion disk luminosity) to a max- 
imum value"! 7 = 0.3. 

We use a well-designed sample of blazars that have been detected in 
the y-ray wavelength band by the Fermi Large Area Telescope (LAT) 
and spectroscopically observed in the optical band’** (Methods). They 
have been classified as BL Lacertae objects or flat-spectrum radio qua- 
sars (FSRQs) according to whether the rest-frame equivalent width of 
their broad emission lines was greater than (FSRQ) or smaller than 
(BL Lac) 5A (rest frame). The sample contains 229 FSRQs and 475 
BL Lacs. Of the latter, 209 have a spectroscopically measured redshift. 
We considered all FSRQs with enough multiwavelength data to have a 
spectral energy distribution that allows the bolometric luminosity to be 
established. This amounts to 191 objects. For BL Lacs, we consider only 
the 26 sources for which broad emission lines were detected. This makes 
them the low-disk-luminosity tail of the full blazar sample. This choice 
is dictated by our desire to measure the accretion luminosity, together 
with the jet power. Through the visible broad emission lines, we recon- 
struct, using a template’*”*, the luminosity of the entire broad line region 
(Lgrr). The latter is a proxy for the accretion disk luminosity, Lgrr = pLaisto 
with'® @ ~ 0.1. The accretion disk luminosity is then directly given by the 
observed broad emission lines, avoiding contamination by the non-thermal 


continuum. Uncertainties are admittedly large (a factor of ~2) for spe- 
cific sources, but the averages should be representative of the true values. 

To model the non-thermal jet emission, we applied to all objects a 
simple, one-zone leptonic model’’ (Methods), from which we derive the 
physical parameters of the jet. The only parameter of interest here, how- 
ever, is the bulk Lorentz factor (I) of the outflowing plasma, found to 
lie in the range 10-15 (Methods and Extended Data Fig. 2). This range 
is similar to that obtained from measurements of the superluminal motion 
of the radio components, but that occurs at larger distances from the black 
hole. The bulk Lorentz factor is thus only weakly model dependent. The 
power that the jet expends in producing the non-thermal radiation is'® 


pbol 
jet 
Prad = 2f r (1) 
where Le is the bolometric jet luminosity, the factor of 2 accounts for 


the two jets and f is of order unity (Methods). If this were the entire 
power of the jet, it would be entirely spent in producing the observed 
radiation. The jet would stop, and could not produce the radio lobes or 
the extended radio emission we see from these objects. It is thus a strict 
lower limit to the jet power. 

Figure 1 shows P,,q as a function of Lgi,, for the 217 blazars that we 
consider. There is a robust correlation between the two: log(Praa) = 
0.98log(Laisk) + 0.639 (with a probability P< 10 °of being random, even 
taking into account the common redshift dependence). We thus finda 
linear correlation between the minimum jet power and the accretion 
luminosity, as expected. Moreover, the two are of the same order. We 
note that this holds also for the considered BL Lacs that do show broad 
emission lines. The dispersion along the fitting line is o = 0.5 dex. An 
important contribution to this dispersion comes from the large ampli- 
tude variability of the non-thermal flux displayed by all blazars, espe- 
cially in the y-ray band, where the bolometric jet luminosity peaks. This 
is true even if we consider the LAT luminosity averaged over two years”, 
as shown by the comparison between LAT and the older Energetic Gamma 
Ray Experiment Telescope (EGRET, on board the Gamma Ray Compton 
Observatory) results. About 20% of the EGRET-detected blazars are not 
detected by LAT”, even though the sensitivity of the latter is 20-fold higher. 

The power in radiation (P,aq) is believed to be about 10% of the jet 
power (Pret), and, remarkably, this holds both for active galactic nuclei 
and y-ray bursts”’. We confirm this result for the case in which there is 
one proton per emitting lepton (Methods and Extended Data Fig. 1). 
This limits the importance of electron-positron pairs, which would reduce 
the total jet power. In addition, pairs cannot largely outnumber pro- 
tons, because otherwise the Compton rocket effect would stop the jet’® 
(Methods). 

An inevitable consequence of Piet ~ 10P;aq is that the jet power is 
larger than the disk luminosity. Therefore, the process that launches 
and accelerates jets must be extremely efficient, and might be the most 
efficient way of transporting energy from the vicinity of the black hole 
to infinity. 
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Figure 1 | Radiative jet power versus disk luminosity. The radiative jet power 
versus the disk luminosity, calculated as ten times the luminosity of the broad 
line region. Different symbols correspond to the different emission lines 

used to estimate the disk luminosity, as labelled. All objects were detected using 
Fermi/LAT and have been spectroscopically observed in the optical'**. Shaded 
areas correspond to lo, 20 and 3o (vertical) dispersion, where o = 0.5 dex. 
The black line is the least-squares best fit (log(Pyaq) = 0.98log(Laisk) + 0.639). 
The average error bar corresponds to uncertainties of a factor of 2 in Laix 
(ref. 16) and 1.7 in P,aa (corresponding to the uncertainty in I’). 


Assuming that 7 = 0.3, appropriate for rapidly rotating black holes, 
we have Mc? = Laisk /n. Figure 2 shows P;.. versus MC for all our sources. 
The white stripe indicates Pie = M. c’, and the black line is the best-fit 
correlation (log(Piet) = 0.92log(M c”) + 4.09) and always lies above the 
equality line. This finding is fully consistent with recent general relativ- 
istic magnetohydrodynamic numerical simulations’ in which the average 
outflowing power in jets and winds reaches 140% of Mc’ for dimension- 
less spin values a = 0.99. The presence of the jet implies that the gravita- 
tional potential energy of the falling matter can not only be transformed 
into heat and radiation, but can also amplify the magnetic field, allowing 
the field to access the large store of black hole rotational energy and 
transform part of it into mechanical power in the jet. This jet power is 
somewhat larger than the entire gravitational power (Mc) of the accret- 
ing matter. This is not a coincidence, but is the result of the catalysing 
effect of the magnetic field amplified by the disk. When the magnetic 
energy density exceeds the energy density (~ pc’) of the accreting matter 
in the vicinity of the last stable orbit, the accretion is halted and the 
magnetic energy decreases, as shown by numerical simulations” and 
confirmed by recent observational evidence’”. 

The mass of the black holes of the FSRQs in our sample has been 
calculated’* assuming that the size of the broad line region scales with 
the square root of the ionizing disk luminosity as indicated by rever- 
beration mapping”, and by assuming that the clouds producing the 
broad emission lines are virialized. The uncertainties associated with 
this method are large (dispersion of o = 0.5 dex for the black hole mass 
values”*), but if there is no systematic error (Methods) then the average 
Eddington ratio for FSRQs is reliable: (Lgigk/Leaa) = 0.1 (Leag; Eddington 
luminosity; Extended Data Fig. 2). This implies that all FSRQs should 
have standard, geometrically thin, optically thick accretion disks”*. There- 
fore, the more powerful jets (the ones associated with FSRQs) can be 
produced by standard disks with presumably no central funnel, con- 
trary to some expectations’””*. 

A related issue is the possible change of accretion regime at low accre- 
tion rate (in Eddington units), or, equivalently, when Lgisk S107? Lega. 
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Figure 2 | Jet power versus accretion power. The total jet power estimated 
using a simple one-zone leptonic model’’, assuming one cold proton per 
emitting electron, versus Mc’ calculated assuming an efficiency y = 0.3, 
which is appropriate for a maximally rotating Kerr black hole. Different 
symbols correspond to the different emission lines used to estimate the disk 
luminosity, as in Fig. 1. Shaded areas correspond to 1a, 20 and 3c (vertical) 
dispersion, where o = 0.5 dex. The black line is the least-squares best fit 
(log(Pjet) = 0.92log(Mc*) + 4.09). The white stripe is the equality line. The 
average error bar is indicated (Mc” has the same average uncertainty of Lgis,3 the 
average uncertainty in Pj¢ is a factor of 3). 


In this case, the disk is expected to become radiatively inefficient, hotter 
and geometrically thick. How the jet responds to such changes is still an 
open issue. An extension of our study to lower luminosities could pro- 
vide some hints. Another open issue is how the jet power depends on 
the black hole spin’’. Our source sample consists by construction of lumi- 
nous y-ray sources that presumably have the most powerful jets, and 
thus have the most rapidly spinning holes. It will be interesting to explore 
less luminous jetted sources, to gain insight into the possible depen- 
dence of the jet power on the black hole spin and the possible existence 
of a minimum spin value for the jet to exist. In turn, this should shed 
light on the longstanding problem of the radio-loud/radio-quiet quasar 
dichotomy’. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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Artificial chemical and magnetic structure at the 
domain walls of an epitaxial oxide 
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Progress in nanotechnology requires new approaches to materials 
synthesis that make it possible to control material functionality down 
to the smallest scales. An objective of materials research is to achieve 
enhanced control over the physical properties of materials such as 
ferromagnets’, ferroelectrics” and superconductors’. In this context, 
complex oxides and inorganic perovskites are attractive because slight 
adjustments of their atomic structures can produce large physical 
responses and result in multiple functionalities**. In addition, these 
materials often contain ferroelastic domains®. The intrinsic symmetry 
breaking that takes place at the domain walls can induce properties 
absent from the domains themselves’, such as magnetic or ferroelec- 
tric order and other functionalities, as well as coupling between them. 
Moreover, large domain wall densities create intense strain gradients, 
which can also affect the material’s properties*’. Here we show that, 
owing to large local stresses, domain walls can promote the forma- 
tion of unusual phases. In this sense, the domain walls can function 
as nanoscale chemical reactors. We synthesize a two-dimensional ferro- 
magnetic phase at the domain walls of the orthorhombic perovskite 
terbium manganite (TbMnO;), which was grown in thin layers under 
epitaxial strain on strontium titanate (SrTiO;) substrates. This phase 
is yet to be created by standard chemical routes. The density of the 
two-dimensional sheets can be tuned by changing the film thickness 
or the substrate lattice parameter (that is, the epitaxial strain), and 
the distance between sheets can be made as small as 5 nanometres in 
ultrathin films”, such that the new phase at domain walls represents 
up to 25 per cent of the film volume. The general concept of using 
domain walls of epitaxial oxides to promote the formation of unusual 
phases may be applicable to other materials systems, thus giving access 
to new classes of nanoscale materials for applications in nanoelec- 
tronics and spintronics. 

Oxide heteroepitaxy is a powerful strategy for strain engineering, 
because a very thin film grown epitaxially on a single-crystal substrate 
of slightly different lattice parameter can adopt the structure of the sub- 
strate. Because complex oxides are known to owe their physical responses 
to the subtle balance of several competing interactions, small modifi- 
cations in the atomic distances can give rise to dramatic changes in the 
magnetic or electrical responses. Therefore, strained films can display 
physical properties very different from the bulk, and can even exhibit 
novel phases". Apart from the horizontal interfaces created by growing 
one oxide on top of another, another type of interface can appear during 
epitaxial growth between two regions of the film with different crystal 
orientations. In some materials, these domain walls, or twin walls'*'°, 
have also shown higher conductivity than the contiguous domains'*”. 

Strained, (001)-oriented TbMnO,; films have been grown on (001)- 
oriented SrTiO; substrates (refs 16-18 and Methods). Despite the large 


mismatch of 5% between the lattice parameters of the film and the 
substrate, the similarity between their in-plane lattice areas makes it pos- 
sible for TbMnO,; to be grown atomically flat and with high crystalline 
quality on single-crystal SrTiO; substrates, aided by the formation of crys- 
tallographic domains’*””. In TbMnOs as in most orthorhombic perov- 
skites, the Tb atoms order in zigzag fashion along the [001] direction. 
Because of symmetry considerations, this zigzag ordering is mirrored 
at every domain wall of a [001]-oriented film. This produces a large dif- 
ference in the bond distances at the domain walls, creating large strains 
highly localized in two-dimensional (2D) sheets at the walls. In epitaxi- 
ally strained thin films, the average size of the domains depends on the 
magnitude of the strain and on the film thickness'*’””°, making it pos- 
sible to engineer different domain wall densities and to investigate the 
effect of the intense and largely localized stresses on the functional prop- 
erties of the films. 

The local structure and chemistry of the films was investigated using 
scanning transmission electron microscopy (STEM) techniques (Meth- 
ods). Figure 1a shows a high-angle annular dark-field (HAADF) image 
of the cross-section of one of the films. Apart from the domain walls 
(observed as vertical lines in the image), the films do not present dislo- 
cations or interfacial layers, suggesting that domain formation is the main 
mechanism responsible for accommodating the epitaxial strain in the 
films. Geometrical phase analysis of the HAADF image”’ shows, along 
the whole film, a homogenous change in the unit-cell strain in the out- 
of-plane direction (¢,,) ofabout —5% with respect to the substrate lattice 
parameter (a, = 0.390 nm) (Fig. 1b). The strain in the in-plane direction 
(é,x) is also homogeneous within each domain, but at the domain walls 
there is 3% less strain than in the domain bulk (Fig. 1c). Figure 1d shows 
an atomically sharp TEM image of the same film taken in plane-view 
mode. Owing to the fourfold symmetry of the substrate, the domain walls 
tend to run along the two perpendicular in-plane directions. The domain 
wall structure and density coincide perfectly with those observed by the 
bright-field TEM image in Fig. le. For the thinnest films, the domains 
can beas small as 5 nm in the direction perpendicular to the walls’®. The 
clear observation of the domain walls in the HAADF-STEM images as 
atomically sharp lines raises the question of their nature and the origin 
of this peculiar contrast. 

Further insight into the nature of the walls is provided by a detailed 
analysis of the HAADF images in Fig. 2a, obtained on a 25 nm-thick 
TbMnO; film in the vicinity of the SrTiO substrate. The domain walls 
exhibit columns with alternating contrast along the pseudo-cubic [001] 
direction. A detail of one of these walls, shown in Fig. 2b, enables the 
construction of a model representing the atomic structure of the wall 
and based on the Z contrast of the metal ions in the HAADF image and 
the crystal structure of bulk TbMnO; (Z, atomic number). Assisted by 
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Figure 1 | Atomic-resolution domain structure of strained TbMnO3. 

a, Cross-sectional HAADF-STEM image of a 25 nm-thick TbMnO; thin film 
grown on SrTiO. b, c, Components ¢,, (b) and é,, (c) of the strain tensor 
(colour scales), obtained by geometrical phase analysis of a. This shows that the 
domains grow uniformly strained, whereas stress is partly released at the 
domain walls. d, e, HAADF-STEM (d) and bright-field TEM (e) images of 
TbMn0O; thin films with the same thickness in plane-view configuration, 
showing a coincident in-plane domain structure. 


this model, a simple domain wall structure can be proposed on the basis 
of the alternation of fully Tb-occupied columns (“Tb columns’) and Tb- 
deficient columns (‘X columns’) of A sites of the ABO3 perovskite struc- 
ture, which could be attributed either to Tb vacancies or to replacement 
of Tb by a lighter element. Though HAADF imaging suggests the exis- 
tence of a reduced amount of Tb in the A sites of every other column of 
atoms at the domain walls, this technique cannot fully assess the chem- 
ical nature of the X columns. To that end, atomic-resolution chemical 
mapping has been carried out combining aberration corrected HAADF- 
STEM imaging and electron energy loss spectroscopy (Methods). This 
permits an unambiguous determination of the chemical composition 
of each atomic column. Figure 2c-f reveals that the X columns at the 
domain walls consist of Mn atoms substituting for Tb atoms. By com- 
parison of the Mn signal from the X positions with that from regular 
Mn positions (outside the walls), it can be stated that Tb is replaced with 
Mn at almost all sites in most X columns. The same reasoning suggests 
that the Mn lattice at B sites of the wall apparently remains unperturbed 
with respect to the matrix, in such a way that the wall appears atomi- 
cally thin from the crystallographic and chemical viewpoint. We claim 
that this chemical substitution of Tb by the smaller Mn cation takes 
place to avoid the presence of very close Tb-Tb atom pairs, which would 
occur in our domain walls as a result of the Tb zigzag ordering along the 
z direction (Fig. 2b). Indeed, the ordered Mn-for-Tb substitution releases 
the stress at the domain wall, as confirmed by Fig. 1c. 

We now turn to investigate the physical properties of the newly syn- 
thesized phase. In TbMnOs, the main magnetic interactions are ferro- 
magnetic within each Mn (001) layer and antiferromagnetic between 
the layers”. It is then expected that the additional Mn atom present at 
the wall, placed between two antiferromagnetically interacting Mn planes, 
will experience magnetic frustration because it cannot be simultaneously 
aligned ferro- or antiferromagnetically with both neighbouring layers. 
This frustration leads to canting of spins, close to the walls, resulting in 
the appearance of a net magnetization. A net magnetic moment has 
been observed in epitaxially grown thin films of ToMnO; (refs 16, 18, 
23, 24) and other orthorhombic manganites””*. Various mechanisms 
have been put forward to explain this macroscopic magnetic response 
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Figure 2 | Structure and chemistry of the domain walls. a, HAADF-STEM 
image of the TbMnO3-SrTiO; interface. b, Detail of a domain wall close to the 
interface with the substrate, with the proposed atomic model superimposed. 
c-f, Spectrum image of the domain wall collected simultaneously with the 
HAADFE signal (c): integrated intensities of the Tb My; (d) and Mn L, ; (e) 
edges from the spectrum image. f, Colour map composed using d and e, with 
the Mn signal in red and the Tb signal in green, showing the substitution of 
alternate Tb atoms for Mn to create a new 2D phase at the domain wall. 


so distinct from that of the bulk material: strain-induced spin canting"’, 
interface magnetism”®, uncompensated spins at antiferromagnetic domain 
walls and magnetoelectric coupling at domain walls” have been reported. 
Solving the magnetic structure of a new Mn-O environment embedded 
in a crystal of TbMnO3, which already has a complex magnetic struc- 
ture, is a great challenge for which a holistic investigation, including 
theoretical calculations is needed. Here we present the first necessary 
steps in this direction. 

The magnetic properties have been investigated using SQUID (super- 
conducting quantum interference device) magnetometry. Figure 3a shows 
the magnetic susceptibility as a function of temperature measured on 
heating under field-cooling and zero-field-cooling conditions. The split- 
ting between field cooling and zero field cooling that takes place below 
~40 K (the bulk paramagnetic—antiferromagnetic transition temper- 
ature), as well as the shape of the inverse susceptibility curves deviating 
downwards with respect to the Curie behaviour (Methods), clearly point 
to the presence of a net magnetic moment in the films, which decreases 
with increasing film thickness. Figure 3b plots the in-plane component 
of the magnetization (M,,,) versus magnetic field (H) measured at 10 K for 
films of different thicknesses. By zooming in around the low-field region 
(Fig. 3c), it can be seen that the remanent magnetization M;,(H = 0) 
scales inversely with the film thickness, the same as the density of domain 
walls (or the inverse domain area), and is as large as 0.48 Bohr magne- 
tons (fg) per formula unit (f.u.) for the 5 nm thin film and 0.11, f-u. © 
for the 25 nm film (Fig. 3d). 

We performed first-principles calculations to gain further atomistic 
insight into this novel structure at the domain walls of our ToMnO; films 
(Methods). We modelled the domain wall by considering the boundary 
between two ferroelastic domains that are rotated by 90° (about the 
out-of-plane [001] axis) with respect to each other’’, which allows us to 
reproduce the pattern of Tb displacements that is apparent in Fig. 2b. 
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Figure 3 | Magnetic behaviour of the strained TbMnO; films. a, Field- 
cooled (FC) and zero-field-cooled (ZFC) magnetic susceptibilities as functions 
of temperature for various thicknesses. b, In-plane magnetization (M) versus 
magnetic field (H) at 10K, for the films in a. c, Close-up of the low-field region 
of two of the curves in b, showing the remanent magnetization, M;,(H = 0). 
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Additionally, we placed Mn atoms at alternating A sites in the boundary 
plane, also in accordance with our experimental findings. We then ran 
a structural relaxation of this initial structure, including a short sim- 
ulated annealing to better search for the global energy minimum, and 
obtained the result depicted in Fig. 4. Interestingly, we find two different 
types of Mn atom occurring at the boundary planes. The first type (Mn(1) 
in the following) presents a tetrahedral coordination with four nearest- 
neighbouring oxygen atoms; in contrast, the second type (Mn(2)) dis- 
plays a quasi-square-planar coordination with four nearest-neighbouring 


Figure 4 | Crystal structure of the new 2D phase. a, b, Lateral (a) and top 
(b) views of the DFT+U supercell, containing two domains and two domain 
wall planes (light blue). The A-site columns in which Tb is replaced by Mn are 
most clearly seen in a, showing the discontinuity in the zigzag Tb displacement 
pattern across the domain wall. Red, O; pink, Tb; dark blue, Mn in domains. 
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The magnetization is normalized per formula unit of a hypothetical 
homogeneous TbMnO;j film. d, M;,(H = 0) versus the inverse of the film 


thickness. The inverse domain area (or density of domain walls) is also plotted 
using the data of ref. 10. 


oxygens. This difference in local coordination is a consequence of the 
structural discontinuity, affecting the rotations of the O¢ octahedra, 
associated with our twin boundary. This gives rise to two crystallograph- 
ically different A sites that alternate along the in-plane direction parallel 
to the wall (Fig. 4b, c). 

As expected, the differently coordinated Mn atoms have distinctive 
properties. Our calculations indicate that Mn(1) atoms have an associated 
magnetic moment of about 4.5j1p, which is considerably larger than that 
of the Mn atoms within the domains (about 3.7 1g). This result suggests 


Na 
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c, Detail of one column of substitutional Mn cations (light blue) at the domain 
wall. The two distinct crystallographic A sites at the domain wall result from the 
patterns of oxygen octahedra rotations of the neighbouring domains. The 


spatial directions correspond to the orthorhombic setting. Light blue, Mn at 
domain walls. 
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that Mn(1) is less positively charged than the Mn cations within the 
domains. For Mn(2), we obtain a magnetic moment of about 3.8/1p, 
which is much closer to the value obtained for the regular B-site Mn 
cations. We also computed the average magnetic interaction between the 
Mn atoms located in the domain wall and its neighbouring Mn cations. 
In addition, we did embedded cluster calculations to check the appro- 
priateness of this approximation (Methods). To simplify the DFT + U 
calculation (density functional theory plus “Hubbard U’; see Methods), 
we assumed a ferromagnetic arrangement of B-site Mn spins in our simu- 
lated supercell, and computed the energies associated with having different 
spin arrangements of Mn(1) and Mn(2). We obtained that, on average, 
Mn(1) interacts antiferromagnetically with its eight neighbouring Mn 
cations, the corresponding coupling constant being J(1) ~ 1.61 meV (we 
obtain 1.92 meV ifno epitaxial constraints—that is, bulk-like conditions— 
are assumed in the simulation). In contrast, we obtain an average ferro- 
magnetic interaction of J(2) ~ —0.63 meV for Mn(2) (—0.58 meV in 
bulk-like conditions). Finally, we find a small antiferromagnetic coupling, 
of about 0.08 meV (0.07 meV in bulk-like conditions) between neigh- 
bouring Mn(1) and Mn(2) atoms within the wall. 

To understand the magnetic properties of this novel 2D phase, we used 
the exchange constants obtained from DFT+ Ucalculations to simulate 
the magnetic ordering in the film (Methods). Figure 5 shows the minimum- 
energy configuration of Mn spins in two neighbouring domains and in 
the domain walls separating them (one unit cell thick), viewed from the 
[001] direction. The red and blue arrows respectively indicate the ori- 
entations of spins in the upper and lower Mn layers of the double unit 
cell in the domains, and the magenta arrows correspond to spins in the 
domain walls. Spins inside the domains show the A-type antiferromag- 
netic ordering (layers of parallel spins coupled antiferromagnetically 
along the [001] direction) rather than the spiral ordering found in bulk 
TbMnOs; (ref. 27), because the compressive strain in the film relieves 
magnetic frustration’* (Methods). Because the [100] and [010] axes in 
neighbouring domains are interchanged, spins form ‘90° antiferromag- 
netic domain walls’, on either side of which the spin directions differ 
by 90°. As in the bulk material, the magnetization inside the domains 
cancels owing to the antiparallel arrangement of spins in neighbouring 
(001) layers. However, near each domain wall we find an uncompensated- 
for magnetic moment: the exchange coupling ofa Mn ion in the wall to 
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Figure 5 | Simulated magnetic order of the new 2D phase. The ordering 
of Mn spins in two neighbouring domains and the three associated domain 
walls viewed from the [001] direction (plane view) for a spin model with 
exchange parameters taken from DFT+U simulations (Methods). The 
domains are one unit cell thick and 16 unit cells wide. The red and blue arrows 
respectively indicate the orientations of spins in the upper and lower Mn layers 
in the domains, and the magenta arrows show the direction of spins in the 


domain walls. At the top is shown the net in-plane magnetic moment per 
atomic plane parallel to the domain wall. 
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eight neighbouring spins at the domain edges favours parallel ordering 
of the latter spins, independently of whether this coupling is ferromag- 
netic or antiferromagnetic, thus inducing a large magnetic moment at 
the ‘interface’ between the domains and the wall equal to 10.16 up per 
Mn spin in the wall. Because real samples show domain walls aligned 
in two perpendicular directions, approximately half of the domain walls 
will not contribute to the measured in-plane remanent magnetization. 
Therefore, according to the theoretical model, a magnetic moment of 
~5.1 4p per Mn spin in the wall, that is, 0.15pp f.u. ', should be detected 
in our experiments (Methods). This is in very good agreement with the 
~0.10,p f.u.' found experimentally. A smaller experimental value is 
expected because domain wall pinning, domain dynamics and demag- 
netization fields are not taken into account by the model. The long- 
range magnetodipolar interactions will then favour parallel in-plane 
magnetic moments, that is, the ferromagnetic state. 

We have described a route to synthesizing novel 2D phases by taking 
advantage of the large stresses present at crystallographic domain walls 
of epitaxially strained complex oxides. This approach should work in 
other epitaxial, [001]-oriented orthorhombic A?*B*TO; perovskites 
under compressive strain, especially in those containing multivalence 
B cations that offer higher flexibility for chemical interactions, and in 
those showing discontinuity in the tiltings of the oxygen octahedra at 
the domain walls. Moreover, the separation between the 2D sheets can 
be tuned, which makes them of potentially great interest in spintronic 
and electronic devices such as spin valves or magnetic storage media. 
We believe that this work opens a new route for the synthesis of diverse 
chemical environments in complex oxides. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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Conjugated polymers enable the production of flexible semiconductor 
devices that can be processed from solution at low temperatures. 
Over the past 25 years, device performance has improved greatly asa 
wide variety of molecular structures have been studied’. However, 
one major limitation has not been overcome; transport properties 
in polymer films are still limited by pervasive conformational and 
energetic disorder’ °. This not only limits the rational design of mate- 
rials with higher performance, but also prevents the study of physical 
phenomena associated with an extended 1-electron delocalization 
along the polymer backbone. Here we report a comparative transport 
study of several high-mobility conjugated polymers by field-effect- 
modulated Seebeck, transistor and sub-bandgap optical absorption 
measurements. We show that in several of these polymers, most not- 
ably ina recently reported, indacenodithiophene-based donor-acceptor 
copolymer with a near-amorphous microstructure’, the charge trans- 
port properties approach intrinsic disorder-free limits at which all 
molecular sites are thermally accessible. Molecular dynamics simu- 
lations identify the origin of this long sought-after regime as a planar, 
torsion-free backbone conformation that is surprisingly resilient to 
side-chain disorder. Our results provide molecular-design guidelines 
for ‘disorder-free’ conjugated polymers. 

In several donor-acceptor co-polymers”’° surprisingly high field- 
effect mobilities >1 cm? V's ' have recently been found despite the 
microstructure of these polymers being less ordered than those of cry- 
stalline or semicrystalline polymers, such as poly-3-hexylthiophene’ 
(P3HT) or poly(2,5-bis(3-alkylthiophen-2-yl)thieno(3,2-b)thiophene)° 
(PBTTT), and in some cases being near amorphous. The high mobilities 
have been attributed to a network of tie chains providing interconnect- 
ing transport pathways between crystalline domains’, but this does not 
fully explain how these polymers can exhibit significantly higher mobi- 
lities than P3HT or PBTTT. To probe energetic disorder in these sys- 
tems, we investigate the Seebeck coefficient ~, which can be determined 
experimentally by measuring the electromotive force EMF that devel- 
ops across a material in response to an applied temperature differential 
AT as follows: « = EMF/AT. For small carrier concentration, as in the 
experiments reported here, the dominant contribution to « is the entropy 
of mixing associated with adding a carrier into the density of states, 
which is determined by the density of thermally accessible transport 
states''"*, If the energetic dispersion is less than kgT (kg, Boltzmann’s 
constant) then the density of thermally accessible states will be temper- 
ature independent and equal to the density of molecular sites. By con- 
trast, if the energetic dispersion among hopping sites is much greater 
than kgT then the density of thermally accessible states will increase as 
the temperature is raised. Thus, we can estimate the energetic disorder 
relative to kgT associated with transport by measuring the temperature 
dependence of the Seebeck coefficient of field-effect transistors (FETs) 
which independently control the carrier density”. 


We have investigated a range of state-of-the-art diketopyrrolo- 
pyrrole (DPP) and isoindigo copolymers, and here show results for 
PSeDPPBT'*”’ and DPPTTT’*” with mobilities of 0.3-0.5 cm” V's" 
and 1.5-2.2cm?V 's_}, respectively (for the chemical structures of 
PSeDPPBT and DPPTTT, see Supplementary Fig. 13a). PBT TT serves 
as a semicrystalline polymer reference system. Among the many poly- 
mers we investigated, we find the lowest degree of energetic disorder in 
indacenodithiophene-co-benzothiadiazole (IDTBT). IDTBT is a highly 
soluble polymer (Supplementary Information section 1) exhibiting high 
field-effect mobilities despite a lack of long-range crystalline order®”®. 
Top-gate IDTBT FET's with films annealed at 100 °C and Cytop gate 
dielectrics reliably exhibit near-ideal performance: a low threshold volt- 
age of Vy, = -3 V, alow contact resistance (Fig. 1a) anda high saturation 
mobility of 1.5-2.5 cm?V_'s ‘extracted from anear-ideal, quadratic 
current dependence on gate voltage. 

These mobility values are lower than the highest values claimed in 
the literature”’°”’. On the one hand, there is ongoing debate about the 
possible overestimation of mobilities in polymer FETs owing to devia- 
tions from the ideal in their electrical characteristics”'. All mobility 
values reported here were conservatively estimated. Artefacts related to 
contact resistance make it possible, for example, to extract mobilities 
up to an order of magnitude higher from non-optimized IDTBT devices 
with non-ideal electrical characteristics (Supplementary Information 
section 2). On the other hand, we have restricted ourselves to top-gate 
FETs with spin-coated films and have not used techniques that may 
enhance mobilities for certain materials by increasing the interfacial 
orientation or alignment relative to that present in the bulk’®. This 
enables us to correlate interface-sensitive FET Seebeck measurements 
with bulk-sensitive optical spectroscopy. 

Among the polymers investigated, IDTBT had not only one of the 
highest mobilities, if not the highest, but also the most-ideal electrical 
characteristics (Supplementary Information section 2). This is evident 
in the temperature-dependent dependence of drain current Ip on gate 
voltage Vg in the saturation regime, which was fitted to Ip « (Vg - Vin)” 
between 200 and 300 K. For IDTBT, the exponent y takes the ideal, 
temperature-independent value 2, (Fig. 1b). By contrast, y increases 
with decreasing temperature as y = T)/T + 1 for PBT TT, PSeDPPBT 
and DPPTTT, which is commonly observed in polymer FETs and is 
interpreted in terms of carriers hopping within an exponential density 
of states with characteristic width kg Ty > kgT (ref. 22). Concomitantly, 
the mobility rises on increasing the magnitude of the gate voltage as 
trap states within the band tail are progressively filled (Supplementary 
Information section 2). Whereas this disorder model fits the other 
polymers, it does not fit the IDTBT FET data even when To is taken 
to be as small as 330 K. We know of no prior report of such ideal (y = 2) 
behaviour for a polymer FET. The IDTBT transfer characteristics are 
well fitted over the entire temperature range with a disorder-free, 
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Figure 1 | Transistor characteristics of IDTBT-based FETs compared with 
other polymer FETs. a, Room-temperature output characteristics and device 
architecture of a typical IDTBT organic FET with channel length L = 20 pm 
and channel width W = 1 mm.b,y plotted versus 1,000/T for IDTBT (structure 
shown), PSeDPPBT, DPPTTT and PBTTT organic FETs. c, Temperature 


metal-oxide-semiconductor FET-like model with a thermally activated, 
but gate-voltage-independent mobility (Fig. 1c). This was confirmed 
by directly extracting the gate voltage dependence of the mobility from 
the transfer characteristics of devices with patterned semiconductor 
layers to minimize leakage and fringe currents. In IDTBT, the mobility 
was nearly independent of gate voltage for | V| > 20 V across the entire 
temperature range, whereas in PBTTT the mobility strongly increases 
with gate voltage at lower temperatures (Fig. 1d). These results suggest 
that energetic disorder is significantly lower in IDTBT than in the other 
polymers. 

To accurately measure the Seebeck coefficients of FETs with 20-50 um 
channel lengths as functions of gate voltage and temperature, we devel- 
oped a microfabricated device architecture with an integrated heater 
and temperature sensors positioned along the FET’s channel’ (Sup- 
plementary Information section 3). The carrier concentrations n in the 
accumulation layer were estimated from measurements of capacitance 
versus gate voltage (Supplementary Information section 4). We find 
Seebeck coefficients (Fig, 2) that are much larger than kp/e ~ 86 WV K * 
(e, elementary charge), that are decreasing functions of increasing car- 
rier concentration n and that are independent of temperature between 
200 and 300 K within the measurement error. Temperature-independent 
Seebeck coefficients over a similar temperature range have been reported 
previously only for single crystals of the molecular semiconductors pen- 
tacene and rubrene”. 

We have attempted to interpret the Seebeck and FET measurements 
as functions of temperature consistently in terms of the variable-range 
hopping disorder model used in ref. 24, that is, akin to models used to 
explain analogous measurements in amorphous silicon”’. For PBTTT 
and PSeDPPBT this may be possible, but the fits depend on several 
unknown parameters and, as discussed above, the disorder model breaks 
down for IDTBT (Supplementary Information section 2). A simpler, 
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evolution of IDTBT transfer curves fitted with a disorder-free MOSFET model 
(drain voltage, Vp = —60 V). d, Gate-voltage dependence of saturation 
mobility 4 at 300 and 240 K for patterned IDTBT (top) and PBTTT 

(bottom) devices. 


more consistent interpretation of the three salient Seebeck features that 
is applicable to all polymers is given by a narrow-band model in which 
charge carriers experience a small degree of energetic disorder and are 
able to access a temperature-independent density of thermally access- 
ible sites. The narrowness of the carriers’ energy bands is probably due 
to polaron formation”, as supported by charge accumulation spectro- 
scopy (Supplementary Information sections 5 and 6). In the simplest 
narrow-band model, the Seebeck coefficient can be expressed as the 
sum of three contributions'* (Supplementary Information section 7): 


a= “tin(* —*:) Miata) tae (1) 
e Ne e 


The first contribution is the change of the entropy of mixing when the 
density of mobile polarons is n, and the density of thermally accessible 
sites is N. The second contribution is the entropy change arising from 
the twofold spin degeneracy. The final term is the high-temperature 
limit of the entropy change produced by a polaron altering the stiffness 
or frequencies of the molecular vibrations. Only the first contribution 
depends explicitly on carrier density. Because in our organic FETs n. < N, 
the primary contribution to the Seebeck coefficient comes from the mix- 
ing contribution. Thus, a plot of « versus the logarithm of the mobile 
carrier density should yield a straight line with slope —(kp/e)In(10) = 
—198 VK‘ decade’ *. It is evident from Fig, 2b that the slopes of the 
near-linear, experimental «-log(n) plots depend on the specific polymer 
and exceed this value. These discrepancies can be reconciled by taking 
into account that a fraction f of the n injected carriers are trapped in 
shallow traps and do not participate in transport. Then n, = n(1 —f) 
and the slope of the x-log(7) plot is increased to —(kg/e)In(10)/(1 — f). 
This procedure is justified if these band-tail-like traps are within ~kgT 
of the narrow band of conducting polaron states. We extract values of 
f= 0.3, 0.5 and 0.7 for IDTBT, PBTTT and PSeDPPBT, respectively. 
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Figure 2 | Field-effect-modulated Seebeck coefficients in high-mobility 
polymer devices. a, Temperature independence of the field-effect-modulated 
Seebeck coefficients of PBTTT and IDTBT. b, Slopes of the Seebeck coefficients 
versus the logarithm of carrier concentration in the accumulation region for 
IDTBT, PBTTT and PSeDPPBT at 300K. The solid lines in b are plots of 

& = (kg/e)In(2N/n). The carrier concentration in IDTBT is slightly lower than 
in the other polymers because of the Cytop gate dielectric used, which has a low 
dielectric constant. The measurement error of the Seebeck coefficient is 
estimated to be 70 tV K ' for the IDTBT device (Supplementary 
Information section 3). 


Thus, our Seebeck measurements indicate significantly less trapping in 
IDTBT than in PBT TT or PSeDPPBT; in IDTBT the majority of charge 
carriers reside in mobile states. 

To interpret the magnitude of the Seebeck coefficients, we estimate 
the number of equivalent sites in our polymers. By assuming there to 
be one equivalent site on each polymer repeat unit, we obtain N= 
74 X 10°? cm? (IDTBT) and N = 8.9 X 107° cm? (PBTTT) on the basis 
of reported unit cell parameters”’*. The solid black and red lines in 
Fig. 2b show the resulting estimates of the Seebeck coefficients for IDTBT 
and PBTTT, respectively, on ignoring the carrier-induced changes in 
these molecules’ vibrations. The small discrepancies between the solid 
lines of Fig. 2b and the experimental data may indicate the vibrational 
contribution. This interpretation yields 50-100 nV K ' for the vibra- 
tional contribution of IDTBT. This appears reasonable, although smal- 
ler than what has been reported for pentacene (265 pV K_'; ref. 27) or 
boron carbides (200 1. V K~? at 300K; ref. 28). 

The small degree of disorder in IDTBT is also consistent with optical 
absorption measurements by photothermal deflection spectroscopy 
(Supplementary Information section 8). This technique provides a 
bulk-sensitive way of probing energetic disorder manifesting itself as 
sub-bandgap tail states of the excitonic joint density of states and of 
estimating their widths in terms of the Urbach energy, E,, extracted 
from the optical absorption coefficient in the vicinity of the band gap 
E,, a(E) = agexp((E— E,)/E,) for E < Ey. For more disordered polymers, 
E, has previously been found to correlate with the Tp values extracted 
from fits of device characteristics according to an empirical relationship 
E,, ~ kpTo (ref. 17). Among the ~20 high-mobility polymers measured 
in this work (examples in Fig. 3 and Supplementary Fig. 13), IDTBT 
exhibits the lowest Urbach energy of 24 meV, which is less than kgT at 
room temperature and, to the best of our knowledge, is the lowest value 
reported in a conjugated polymer. Notably, the second- and third-lowest 
values are also measured in high-mobility polymers, naphtalenediimide- 
based P(NDI2OD-T2)*””’ (E, = 31 meV) and DPPTTT (E, = 33 meV). 
This should be compared with PBTTT (E, = 47 meV). 

Our results demonstrate that donor-acceptor copolymers without 
pronounced crystallinity can exhibit a lower degree of energetic disorder 
than crystalline or semicrystalline conjugated polymers; it is important 


386 | NATURE | VOL 515 | 20 NOVEMBER 2014 


105 e PBTTT (I) 
Pe: & PSeDPPBT (II) 
aL 
E y DPPTTT (Ill) 

& 10¢b  IDTBT (IV) 
oF 
5 
° 
1o} 
| = 
S 
6 103 
fe) c 
9 E 
2 E 
2 [ 
A v 
10? v 
FE] ov Ion iv 
1.0 1.5 2.0 2.5 3.0 


E(ev) 
Figure 3 | Energetic disorder probed using photothermal deflection 
spectroscopy. Absorption coefficient of IDTBT, DPPTTT, PSeDPPBT and 
PBTTT films, measured by photothermal deflection spectroscopy. Solid lines 
represent exponential tail fits for extraction of the Urbach energies E, (inset). A 
relative error of 5% in the value of E, was estimated to result from uncertainty in 
the fitting procedure. 


to understand the underlying microstructural origin for this. We are 
also interested in whether IDTBT’s exceptional properties originate in 
certain unique molecular design features that may not yet be imple- 
mented to the same degree in other polymers with comparable mobi- 
lities but with otherwise less ideal transport characteristics. IDTBT 
cannot simply be understood as a classical rigid-rod polymer; its high 
solubility in a wide range of solvents suggests a degree of chain flexibility 
that is not common for such polymers (Supplementary Information 
section 1). To understand these matters better, we have modelled the 
three-dimensional structures of IDTBT, P(/(NDI2OD-T2) and PBTTT by 
combining quantum chemical and molecular dynamics calculations*””! 
(Supplementary Information section 9). The conformational search 
points to interdigitated side chains as the thermodynamic, lowest-energy 
structures in the three polymers (Supplementary Fig. 14). However, in 
contrast to PBT TT”, for IDTBT the X-ray pattern simulated for such a 
dense, ordered, interdigitated side-chain arrangement is not in agree- 
ment with experimental data”®. Instead, much better agreement with the 
measured X-ray diffraction is obtained when a less dense, disordered, 
non-interdigitated side-chain arrangement is built from numerical anneal- 
ing experiments (Supplementary Fig. 15). A similar protocol was also 
applied to simulate side-chain disorder in P0/(NDIZOD-T2) and PBTTT. 
In relation to their crystalline phases, the backbone conformations in 
these disordered structures differ significantly between the polymers 
(Fig. 4a): IDTBT adopts a wavy, yet remarkably planar, largely torsion- 
free backbone; the deviation from planarity remains exceptionally small 
(torsion angle of 5.2 + 4.0°). P/NDI2OD-T2) behaves similarly; although 
it is not a planar molecule, the torsion-angle distribution between the 
NDI and thiophene units remains relatively narrow (38.2 + 10.7°). In 
contrast, PBTTT chains, while maintaining a linear conformation, explore 
a broader range of torsion angles (27.2 + 14.6° between thiophene and 
thienothiophene). 

We have direct experimental evidence for a near-torsion-free back- 
bone in IDTBT from pressure-dependent Raman spectroscopy (Fig. 4b 
and Supplementary Information section 10). If there was significant 
torsion in as-deposited films, the backbone could be planarized by apply- 
ing a hydrostatic pressure of a few gigapascals, as previously observed for 
structurally related poly-dioctylfluorene-co-benzothiadiazole* (F8BT), 
and the Raman intensity ratio between the ring stretching mode of the 
IDT unit at 1,613 cm‘ and the ring stretching mode of the BT unit at 
1,542 cm | would be expected to be pressure dependent. However, we 
find experimentally that this ratio is remarkably pressure independent 
between 0 and 2.5 GPa, suggesting that the IDTBT backbone is indeed 
already planar in as-deposited films. 
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Figure 4 | Resilience of torsion-free polymer backbone conformation to 
side-chain disorder. a, Simulations of the backbone conformation of IDTBT 
and PBTTT in side-chain-disordered and non-interdigitated structures. The 
side chains and hydrogen atoms are omitted for clarity. Yellow, sulphur atoms; 
blue, nitrogen atoms. b, Pressure dependence of the intensity ratio of the 
Raman transitions at 1,542 cm! and 1,613cm ! (top) and the Raman 


The frontier orbitals of the three theoretically investigated polymers 
are spread along the backbones (Supplementary Fig. 20), such that con- 
formational disorder is expected to broaden the density of states (DOS). 
We have calculated the tail width of the DOS of the highest occupied 
molecular orbital (HOMO) in IDTBT to be the least affected by side- 
chain disorder; likewise for the DOS of the lowest unoccupied molecular 
orbital (LUMO) of P(NDI2OD-T2), here partly because of the stronger 
confinement of the LUMO on the NDI units. In contrast, the HOMO 
DOS of PBTTT broadens significantly on introducing side-chain dis- 
order (Table 1). Remarkably, even in a completely amorphous phase simu- 
lated by cooling low-density systems made of initially highly energetic, 
randomly distributed oligomers (Supplementary Information section 9), 
IDTBT accommodates side-chain disorder through bends in the back- 
bone while retaining its near-planar conformation (Fig. 4c); its DOS is 
not significantly broadened. In contrast, the other two polymers, in 
particular PBTTT, adopt conformations with larger spans in torsion 
angles and wider DOSs. The relative trend in disorder resilience evid- 
ent from Table 1 is remarkably consistent with the measured Urbach 
energies and transport properties. 

Our results provide an explanation for the surprisingly high mobilities 
in donor-acceptor copolymers with less crystalline microstructures than 
crystalline or semicrystalline P3HT or PBTTT, in terms ofa low degree 
of energetic disorder originating in a remarkable resilience of the back- 
bone conformation to side-chain disorder, which is inevitable when 
thin films are solution-deposited by rapid drying techniques. The excep- 
tional properties of IDTBT suggest several cooperating molecular design 
guidelines for discovering a wider class of such “disorder-free’ conju- 
gated polymers: (1) collinear conjugated units with only a single or a 
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spectrum of IDTBT measured using a diamond-anvil cell (bottom). a.u., 
arbitrary units. c, Simulation of the backbone conformation of IDTBT in the 
amorphous phase. A single chain from the simulated unit cell has been 
highlighted in bright yellow (other colours as in a). d, Calculated gas-phase 
torsion potentials of IDTBT and PBTTT. For PBTTT, the potential for torsion 
between the thiophene and thienothiophene units is shown. 


minimal number of torsion-susceptible linkages in an extended repeat 
unit (also, the electronic structure will tend to be less susceptible to 
residual torsions for larger conjugated units); (2) a relatively steep gas- 
phase torsion potential with minima ideally (though not necessarily) 
around 180°, 0° or both (Fig. 4d); and (3) long side-chain substitution 
on both sides of one of the conjugated units to enable space filling in 
non-interdigitated structures without introducing backbone torsion 
and hindering close t— 1 contacts. Transport in such torsion-free poly- 
mers is approaching intrinsic limits, in which all molecular sites along 
the polymer backbone are thermally accessible; even higher mobilities 
might be achievable in this regime through closer n-n contacts. The 
level of energetic disorder as measured by the Urbach energy is com- 
parable to that of certain inorganic crystals, such as GaN (ref. 33). That 
this is possible in near-amorphous polymers is highly surprising. Our 
results could lead to a new generation of disorder-free conjugated poly- 
mers with improved charge, exciton, spin and other transport prop- 
erties for a broad range of applications, and to the observation of physical 
phenomena that have hitherto been prevented by disorder-induced 
localization. 


Table 1 | DOS broadening induced by side-chain disorder 


Microstructure IDTBT P(NDI20D-T2) PBTTT 
HOMO (meV) LUMO (meV) HOMO (meV) 
Crystalline 26 33 30 
Disordered 31 44 48 
Amorphous 31 69 108 


Values of the widths of the tails of the DOSs extracted by fitting the simulated DOSs of the different 
polymers/phases to an exponential function. 
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Overcoming the limitations of directed C-H 
functionalizations of heterocycles 


Yue-Jin Liu'*, Hui Xu'*, Wei-Jun Kong', Ming Shang’, Hui-Xiong Dai! & Jin-Quan Yu"? 


In directed C-H activation reactions, any nitrogen or sulphur atoms 
present in heterocyclic substrates will coordinate strongly with metal 
catalysts. This coordination, which can lead to catalyst poisoning or 
C-H functionalization at an undesired position, limits the application 
of C-H activation reactions in heterocycle-based drug discovery’, 
in which regard they have attracted much interest from pharmaceu- 
tical companies’ °. Here we report a robust and synthetically useful 
method that overcomes the complications associated with perform- 
ing C-H functionalization reactions on heterocycles. Our approach 
employs a simple N-methoxy amide group, which serves as both a 
directing group and an anionic ligand that promotes the in situ gen- 
eration of the reactive PdX, (X = ArCONOMe) species from a Pd(0) 
source using air as the sole oxidant. In this way, the PdX, species is 
localized near the target C-H bond, avoiding interference from any 
nitrogen or sulphur atoms present in the heterocyclic substrates. This 
reaction overrides the conventional positional selectivity patterns 
observed with substrates containing strongly coordinating hetero- 
atoms, including nitrogen, sulphur and phosphorus. Thus, this oper- 
ationally simple aerobic reaction demonstrates that it is possible to 
bypass a fundamental limitation that has long plagued applications 
of directed C-H activation in medicinal chemistry. 

Heterocycles are commonly found in drug candidates owing to their 
ability to improve solubility and reduce the lipophilicity of a drug 
molecule’”. The potential application of C-H activation technologies 
in the rapid synthesis and diversification of novel heterocycles has at- 
tracted widespread attention from the pharmaceutical industry*°. One 
of the most significant challenges in the application of C-H functiona- 
lization reactions is achieving robust control of positional selectivity. 
Directed C-H metalation has recently emerged as a reliable approach 
for achieving a diverse collection of selective C-H functionalization reac- 
tions, and activation of both proximate* ' and remote’* C-H bonds has 
proven feasible. The use of a weakly coordinating functional group to 
achieve high effective molarity of the catalyst around the C-H bond of 
interest has greatly expanded the substrate scope of these processes”. 
Unfortunately, these C-H functionalization processes are generally in- 
compatible with the majority of medicinally important heterocyclic sub- 
strates because the heteroatoms can interfere with the catalyst’. For 
example, two strategies have recently been developed to protect pyri- 
dines with Lewis acid or N-oxide formation in order to prevent the classic 
cyclopalladation and perform the desired allylic C-H acetoxylation”®’. 
In directed C-H activation, strongly coordinating nitrogen, sulphur and 
phosphorous heteroatoms often outcompete the directing groups for 
catalyst binding, thus preventing activation of the C-H bonds proximate 
to the directing groups (Fig. 1a). When coordinated to a heterocycle, the 
catalyst is either unreactive due to the lack ofa proximate C-H bond or 
only capable of activating the C-H bonds adjacent to the coordinating 
heteroatom. This inherent drawback of directed C-H activation, especially 
with Pd(11) catalysts, is currently a major obstacle to widespread appli- 
cation of C-H functionalization in heterocycle-based medicinal chem- 
istry. Similarly, C-H functionalization of heterocycles using non-directed 


approaches has found limited success in terms of substrate scope and 
efficiency’?*®. 

Here we report an aerobic C-H functionalization reaction that ef- 
fectively overcomes catalyst poisoning by heterocycles and overrides 
the commonly observed positional selectivity dictated by heterocycles. 
The catalytic cycle begins with the on-site generation ofa reactive Pd(1) 
species (Fig. 1b). To this end, a Pd(0) precursor coordinates with a 
simple, carboxylic-acid-derived N-methoxy amide directing group 
(CONHOMe)”, which promotes subsequent oxidation of Pd(0) to Pd(m) 
by air present in the reaction mixture”. The directing group is the only 
anionic X-type ligand in the reaction mixture that can be incorporated 
into the resulting PdX, species. Thus, any Pd(0) species in solution that 
are transiently coordinated to a neutral o-donor heterocycle (L-type 
ligand) must migrate to the CONHOMe directing group in order to form 
the reactive PdX species which then cleave adjacent C-H bonds, thereby 
bypassing the adverse effects of heterocycles. Remarkably, the commonly 
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Figure 1 | Development of a catalytic system to overcome fundamental 
limitations of heterocyclic C-H bond functionalizations. a, Strong 
coordination between Pd(11) catalysts and heterocycles poisons catalysts or 
restricts positional selectivity. DG, directing group. b, Our approach involves 
avoiding heterocycle poisoning via on-site generation of Pd(1) catalysts. 
CONHOMe (shown red) is a practical directing group; the C-H bond shown 
blue was previously difficult to activate; and air is a practical oxidant. 

c, Overriding the conventional positional selectivity directed by heterocycles. 
Five representative heterocycles are shown that are known to direct facile 
cyclopalladation reactions. 
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observed positional selectivity patterns dictated by the well-known 
cyclopalladation in heterocycles are overridden (Fig. 1c), even when 
C-H bonds are present ortho to strongly coordinating heteroatoms. 
Since C-H palladation is often the selectivity-determining step, we anti- 
cipate this switch of positional selectivity could be extended to other 
C-H activation transformations on further development. 

We began our investigations using CONHOMe (ref. 27). A lack of 
heterocyclic substrates among extensive reports on the use of this other- 
wise powerful directing group indicates widespread heterocycle poison- 
ing in directed C-H activation. To verify this assessment, we performed 
an extensive survey by applying the previously reported reactions using 
the N-methoxy amide directing group to representative heterocyclic 
substrates shown in Fig. 1c. We found that no protocol was compatible 
with these heterocyclic substrates (for details, see Supplementary Infor- 
mation). We surmised that a novel approach would be needed to over- 
come the strong coordination of Pd(11) species with heterocycles. Pd(11)X, 
catalysts are known to strongly coordinate with neutral o-donors such 
as pyridines. On the other hand, Pd(0) species possess a comparatively 
weaker affinity for this type of ligand because they are more nucleo- 
philic than the Pd(11) catalysts. We, therefore, focused on the design of 
a catalytic system that would begin with Pd(0) species, which could 
coordinate comparatively weakly with both pyridine and the directing 
group ina reversible manner. We hypothesized that a specifically designed 
anionic directing group, if coordinated to Pd(0) species, could accelerate 
the generation of the reactive Pd(1I)X, species if this directing group 
were the sole X-type ligand in the reaction mixture (Fig. 1b)’*. Once 
generated on-site, the resulting PdX. species could potentially cleave a 
C-H bond adjacent to the directing group before being scavenged by 
the pyridyl group. In essence, the pyridyl group would serve as a Pd(0) 
reservoir, rather than poisoning Pd(11). To establish the feasibility of this 
approach, we used a simple arene substrate la (Fig. 2a) and employed 
CONHOMe to develop a highly efficient C-H functionalization reac- 
tion in the presence of a catalytic amount of Pd,(dba); (dba, dibenzy- 
lideneacetone). We anticipated that Pd(0) would be converted to Pd(11) 
(ArCONOMe), in the presence of an oxidant. Air was identified as an 
ideal oxidant in that it would avoid the introduction of other anions”. 
Through extensive screening (see Supplementary Information), we found 
that arene 1a reacts with 1.5 equiv. of isocyanide 2 in the presence of 
2.5 mol% Pd,(dba)3 in 1,4-dioxane under 1 atm air at 80 °C for 30 min 
to give ortho-functionalized 3-(imino)isoindolinone 3a in 93% isolated 
yield (Fig. 2a). 

The structure of 3a was unexpected based on earlier precedents in 
isocyanide insertion chemistry”, indicating the involvement of a new 
isocyanide insertion pathway. To rationalize the formation of 3a, we 
reacted 2,6-difluoro-N-methoxybenzamide (A) with 25 mol% Pd,(dba); 
under the reaction conditions given in Supplementary Information, 
attempting to identify potential Pd(11) intermediates before the C-H acti- 
vation event (Fig. 2b). We were able to characterize a new C-amidinyl 
Pd(11) species E by X-ray crystallography, which allows us to propose 
an intriguing reaction pathway. We speculate that the initially formed 
Pd(11) species B undergoes migratory insertion with t-BuNC to give C, 
which then rearranges to form C-amidinyl Pd(11) precursor D. The chlo- 
ride in E is probably incorporated from the CHC1, contained in com- 
mercial Pd3(dba)3 via anionic exchange with D. In hindsight, itis crucial 
that the unexpected C-amidinyl Pd(1) species D or Eis able to cleave the 
C-H bonds in a highly efficient manner. 

The use of air as an oxidant is essential for this transformation (Fig. 2a, 
entries 1, 2). Interestingly, a significantly lower yield is obtained when 
the reaction is conducted under O, (1 atm). Presumably, in high con- 
centration, O, can intercept one of the intermediates in the catalytic 
cycle. The efficiency of this catalytic system was further demonstrated 
by running the reaction on a gram-scale, using 0.5 mol% Pd2(dba); to 
afford product in 89% isolated yield, albeit with a prolonged reaction 
time (24h) (Fig. 2c). To demonstrate the synthetic utility of this C-H 
functionalization process, 3a was readily converted to a number of 
synthetically versatile building blocks, including an ester, an amine and 
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Figure 2 | Discovery of an efficient aerobic C-H activation reaction. 

a, A catalytic C—-H activation reaction using air as the sole oxidant. See 
Supplementary Information for experimental details; yields were determined 
by ‘H NMR analysis with dibromomethane as an internal standard; the yield in 
parentheses in column 4 is the isolated yield. b, Characterization of a 

reactive C-amidinyl Pd(11) intermediate. The reaction scheme shows on-site 
generation of Pd(11) precatalyst B by air oxidation; migratory insertion into 
isocyanide to form C; acyl migration leading to D; and anion exchange to give 
E. c, Gram-scale reaction and diverse transformations. Red text highlights 
low catalyst loading and the use of air as the inexpensive oxidant. 


a lactam, via one- or two-step procedures (Fig. 2c; see Supplementary 
Information for details). 

The scope of arene substrates was surveyed using 2.5 mol% Pd,(dba); 
(Fig. 3a). A variety of substituents on the aryl ring were well tolerated 
(3a-t). These results demonstrate that the on-site generation of Pd(1) 
precursor B using air as the oxidant and subsequent C-H functiona- 
lizations are feasible. The fast rate of this reaction encouraged us to 
examine whether heteroatom poisoning could be overcome using this 
new reaction pathway. We found that the reaction of furans, benzofu- 
rans and benzothiophenes proceeds smoothly to afford the desired pro- 
ducts 5a-e in 86-98% yields (Fig. 3b). Indole, pyrrole, thiazole, pyrazole 
and imidazole substrates are also converted to the corresponding func- 
tionalized products 5f-k in good yields (74-99%). The strongly coor- 
dinating nitrogen atoms in pyridines and quinolines are well known 
to poison directed C-H activation under Pd(u) catalysis. Thus, the excel- 
lent yields obtained with various pyridine substrates (51-q), including 
an aminated pyridine (50), provide further evidence that this catalytic 
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Figure 3 | Scope of the reaction. Top row, reagents and products. a, Directed 
C-H functionalization of arenes; for each compound, the isolated yield is 
shown in per cent, together with the duration of the reaction. b, Directed C-H 
functionalization of heterocycles; yield and duration are shown as in a. 


system can overcome severe heteroatom poisoning. Acetyl-protected 
tetrahydroquinoline- and indoline-containing substrates can also be func- 
tionalized, giving 5r and 5s. A free amino group is tolerated, albeit re- 
sulting lower yield (5r’, 51%). A phosphoryl group is also compatible 
(5t). 

The importance of using a Pd(0) source to enter the catalytic cycle is 
further supported by the lack of reactivity using commonly employed 
Pd(i1) sources, including PdCl,, Pd(TFA). and Pd(OTf)2, in place of 
Pd,(dba)3. In particular, exposing 4m, a representative pyridine- 
containing substrate, to the reaction conditions using these catalysts 
led to full recovery of starting material in the presence or absence of 
dba ligand (Fig. 3b). The desired product, 5m, was formed in 40% yield, 


For 4m, there was no reaction when Pd,(dba); was replaced by PdCl,, 
Pd(TFA), or Pd(OTA),; the use of 5 mol% Pd(OAc), gave the desired product 
in 40% yield; see Supplementary Information for experimental details. 


however, when 5 mol% Pd(OAc), was used as the catalyst. This is most 
likely to be due to the known facile reduction of Pd(OAc), to Pd(0) by 
isocyanide*’. To seek experimental evidence in support of this reason- 
ing, we stirred Pd(OAc),, PdCl,, Pd(TFA), and Pd(OTf), separately 
with t-BuNC in dioxane at 80 °C. We found that Pd(OAc)> was com- 
pletely reduced to Pd(0) within 30 min while other Pd(1) catalysts re- 
mained intact (for details, see Supplementary Information). To further 
demonstrate the importance of the on-site generation of PdX2 (X= 
ArCONOMe) from Pd(0) in the absence of external anions, we also 
carried out the standard reaction in the presence of different anions, 
namely Cl’, TFA and OTf . We found that these anions consistently 
prevent the desired reaction (see Supplementary Information). 
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It is well established that substrates containing C-H bonds ortho to 
strongly coordinating heterocycles will undergo facile heterocycle- 
directed ortho-cyclopalladation. This reactivity can inhibit the activa- 
tion of a target C-H bond that is proximate to a weaker directing group 
(here, the CONHOMe functional group)'*’, which may prevent the 
use of directed C-H functionalization reactions in substrates containing 
heterocycles. Not surprisingly, reaction of para-(2-pyridyl)benzamide 
(6a; Fig. 4a) with Pd(OAc), or Pd(TFA), in the absence of t-BuNC gave 
exclusively the cyclopalladation product directed by the pyridine, sug- 
gesting that pyridine is a stronger coordinating group than CONHOMe 
(for X-ray characterization of the cyclopalladation intermediate formed 
from 6a, see Supplementary Information). However, the unprecedented 
compatibility of our catalytic system with heterocyclic substrates pro- 
mpted us to examine whether our system could override the conven- 
tional heterocycle-directed cyclopalladation. 

We chose as a test substrate para-(2-pyridyl)benzamide (6a; Fig. 4a), 
which has a 2-pyridyl group para to the N-methoxy amide directing 
group. With our catalytic system, C-H functionalization proceeds ex- 
clusively at the position ortho to the CONHOMe group to provide the 
desired product 7a in 97% isolated yield. To investigate the origin of the 
observed switch of positional selectivity, we reacted 6a with various Pd(1) 
catalysts under the classic cyclopalladation conditions. As expected, 
palladation at the position ortho to the pyridyl group occurs to give the 
cyclopalladate intermediate in quantitative yield (see Supplementary 


a CONHOMe CONHOMe 
H H. t-BuNC (1.5 equiv.)  (H)Heterocycle 
a '@  Pda(dba) (2.5 mol%) N-t-Bu 
or > 
Hy Heterocycle dioxane, 80 °C, air (H)Heterocycle 
Heterocycle Hp 
6a-g 6h 7a-h 
fe) 
N-t-Bu 
N-t-Bu N-t-Bu N-t-Bu 
| ‘Ni N 
ZN Meo’ 
7a, 97%, 2h 7b, 93%, 2h 7c, 90%, 2h 7d, 91%, 2h 
a Bu 
© Me 
Te, 98%, 2h 7f, 85%, 2h 7g, 91%, 2h 7h, 76%, 2h 
b 
t-BuNC (1.5 equiv.) 
Pdo(dba)3 (2.5 mol%) 
$e 
dioxane, 80 °C, air 
t-Bu 
N 
sae 
| OMe a9 | 
7i, 90%, 2h 7j, 85%, 2h 7k, 81%,10h 71, 74%, 10h 
fo) t-Bu fe) Rig Bu t-Bu 
N Ne N 
=N S/N =N 
| J] OMe | 7] OMe i v7 OMe 
N N 
Me MeO 
7m, 88%, 2h 7n, 87%, 2h 7o, 86%, 2h 7p, 74%, 10h 
O oO oO oO 
NH NH 
Coe OOo « “5 
oN =N 
8a, 98% 8b, 81% 8c, 63% 8d, 72% 


Figure 4 | Overriding the conventional positional selectivity dictated by 
heterocycles. a, b, Reactions of substrates prone to heterocycle-directed 
cyclometalation: top row of each panel shows the reaction, lower rows show 
isolated yields of the indicated products. c, Lactam products formed via 
hydrogenation and removal of the ¢-butyl groups (yields over two steps are 
shown). See Supplementary Information for experimental details. 
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Information). In contrast, no traces of this intermediate can be detected 
throughout our standard reaction when Pd,(dba); is used as the catalyst. 
These experiments suggest that the use of Pd,(dba); catalyst under our 
aerobic conditions effectively avoids the conventional pyridyl-directed 
ortho-palladation pathway. We subsequently replaced the pyridine with 
other medicinally important heterocycles, including a quinoline, pyr- 
azine, pyrimidine, pyrazole and thiazole. Uniformly excellent yields 
of the desired C-H functionalization products are obtained (7b-f, 85- 
98% yield) for these substrates. In light of the well-known directing 
power of oxazoline in ortho-palladation”’, para- and meta-oxazoliny] 
substituted substrates 6g and 6h were also subjected to our standard re- 
action conditions. In both cases, only the desired C-H functionaliza- 
tion products are formed (7g and 7h, 91% and 76% yield, respectively). 

We further explored the utility of this catalytic system for 2- 
phenylpyridine substrates containing the CONHOMe group on the pyr- 
idine ring (6i-p; Fig. 4b). We anticipated that achieving reactivity and 
positional selectivity with these substrates could be particularly challen- 
ging owing to the electron-deficiency of the pyridine ring, which deac- 
tivates the C-H bonds ortho to the CONHOMe group. We found that 
C-H functionalization of these 2-phenylpyridine substrates occurs ex- 
clusively ortho to the N-methoxy amide group, affording the desired 
products in good to excellent yields (7i-p, 74-90% yield). 

Finally, representative C-H functionalization products from this reac- 
tion were converted to synthetically useful lactams by hydrogenolysis 
with Pd/C under H; followed by treatment with trifluoroacetic acid. Our 
new catalytic system provides an operationally simple and versatile route 
to access medicinally important lactams (8a-d)'*. We anticipate that 
the switch of the positional selectivity in the cyclopalladation step, often 
as the selectivity-determining step, could be exploited in other catalytic 
C-H activation transformations. 
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Agricultural Green Revolution as a driver of 
increasing atmospheric CO, seasonal amplitude 


Ning Zeng', Fang Zhao’, George J. Collatz*, Eugenia Kalnay’, Ross J. Salawitch', Tristram O. West? & Luis Guanter* 


The atmospheric carbon dioxide (CO,) record displays a prominent 
seasonal cycle that arises mainly from changes in vegetation growth 
and the corresponding CO, uptake during the boreal spring and sum- 
mer growing seasons and CO, release during the autumn and winter 
seasons’ *. The CO; seasonal amplitude has increased over the past 
five decades, suggesting an increase in Northern Hemisphere bio- 
spheric activity””*. It has been proposed that vegetation growth may 
have been stimulated by higher concentrations of CO, as well as by 
warming in recent decades, but such mechanisms have been unable 
to explain the full range and magnitude of the observed increase in 
CO, seasonal amplitude** ’. Here we suggest that the intensification 
of agriculture (the Green Revolution, in which much greater crop yield 
per unit area was achieved by hybridization, irrigation and fertil- 
ization) during the past five decades is a driver of changes in the 
seasonal characteristics of the global carbon cycle. Our analysis of 
CO, data and atmospheric inversions shows a robust 15 per cent 
long-term increase in CO, seasonal amplitude from 1961 to 2010, 
punctuated by large decadal and interannual variations. Using a ter- 
restrial carbon cycle model that takes into account high-yield cultivars, 
fertilizer use and irrigation, we find that the long-term increase in 
CO, seasonal amplitude arises from two major regions: the mid- 
latitude cropland between 25° N and 60° N and the high-latitude 
natural vegetation between 50° Nand 70° N. The long-term trend of 
seasonal amplitude increase is 0.311 + 0.027 per cent per year, of which 
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sensitivity experiments attribute 45, 29 and 26 per cent to land-use 
change, climate variability and change, and increased productivity 
due to CO, fertilization, respectively. Vegetation growth was ear- 
lier by one to two weeks, as measured by the mid-point of vegetation 
carbon uptake, and took up 0.5 petagrams more carbon in July, the 
height of the growing season, during 2001-2010 than in 1961-1970, 
suggesting that human land use and management contribute to sea- 
sonal changes in the CO, exchange between the biosphere and the 
atmosphere. 

Ina 50-year time span from 1961 to 2010, the world population more 
than doubled, from 3 billion to 7 billion people, while crop production 
tripled, from 0.5 petagrams of carbon per year (Pg Cyr ')to1.5PgCyr * 
(Fig. 1). The threefold increase in crop production was accompanied by 
a mere 20% increase in the land area of major crops, from 7.2 million km? 
to 8.7 million km? (Extended Data Table 1). Higher crop production is 
thus due mostly to greater yield per unit area, an extraordinary techno- 
logical feat that is often termed the agricultural Green Revolution. The 
higher yield can be attributed to three major factors: high-yield crop 
varieties such as high-yield corn, hybrid dwarf rice and semi-dwarf wheat, 
use of fertilizer and pesticide, and widespread use of irrigation”™*. 

The plausibility of a potential Green Revolution impact on the CO, 
seasonal cycle follows from a ‘back-of-the-envelope’ estimate. The 
global total terrestrial biosphere net primary productivity (NPP) is 
about 60 Pg Cyr ', and the seasonal variation from peak to trough is 


Figure 1 | Changing world population, land area 
of major crops, annual crop production and 
changes in crop GPP seasonal cycle. Crop 
production tripled (a) to support 2.5 times more 
people (b) on only 20% more cropland area 

(c), enabled by the agricultural Green Revolution. 
Plotted in ¢ is the VEGAS model simulated crop 
production, compared to the estimate from FAO 
statistics. The inset in c shows modelled GPP for 
the periods 1901-1910, 1961-1970 and 2001-2010 
for a location in the US Midwest agricultural 

belt (98° W-40° N) that was initially naturally 
vegetated and later converted to cropland. The 
change in seasonal characteristics from these 
transitions may have contributed to the change 

in atmospheric CO, seasonal amplitude. 
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30-60 Pg Cyr ' (ref. 15). Of the NPP, about 6 Pg Cyr ‘ (or 10%) is asso- 
ciated with crop production as the human-appropriated NPP'*"*. Assum- 
ing that half of crop NPP—that is, 3 Pg C yr_'—is the increase due to the 
Green Revolution, this leads to an increase of global NPP by 5%-10% 
(3 divided by 60 or 30). This rate is substantial compared to the increase 
in CO, seasonal amplitude’. 

Westudied this hypothesis by analysing a variety of observational data 
and model output, including the Mauna Loa Observatory CO, record 
from 1958 and a global total CO, index from 1981 (ref. 3), and atmo- 
spheric inversions Jena81 and Jena99"’ and the CarbonTracker”’. Another 
key tool is the terrestrial carbon cycle model VEGAS”'”? which, ina first 
such attempt, represents the increase in crop gross primary productivity 
(GPP) by changes in crop management intensity and harvest index (the 
ratio of grain to total aboveground biomass). Seasonal amplitude is cal- 
culated using a standard tool, CCGCR ?3 Details are in the Methods. 

The VEGAS model was run from 1701 to 2010, forced by observed 
climate, annual mean CO, and land-use and management history. The 
model simulates an increase in crop production from 0.6 PgC yr in 
1961 to 1.4PgCyr ‘in 2010, an increase of 0.8 Pg Cyr’ ', slightly smaller 
than the Food and Agriculture Organization of the United Nations (FAO) 
statistics of 1 Pg Cyr , (Fig. 1). The net terrestrial carbon flux to the atmo- 
sphere (the net land—atmosphere carbon flux, Fra) has a minimum in 
July, corresponding to the highest rate of vegetation growth and carbon 
uptake (Fig. 2 inset). The maximum of Fy, occurs in October, when 
growth diminishes yet the temperature is still sufficiently warm for 
high rates of decomposition in the Northern Hemisphere. The model- 
simulated seasonal cycle of Fra, in both amplitude and phasing, is within 
the range of uncertainty from the atmospheric inversions (Extended 
Data Fig. 2). 
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Figure 2 | Temporal evolution of seasonal amplitude. Trends for the 
VEGAS simulated Fy, (black), of the Mauna Loa Observatory CO, mixing 
ratio (CO2mro; green) and the global CO, mixing ratio (CO2¢rogar; purple), 
and Fy, from atmospheric inversions of Jena81 (red), Jena99 (brown) and 
CarbonTracker (blue). Changes are ratios relative to the 1961-1970 mean for 
VEGAS and the other time series are offset to have the same mean for 
2001-2010. Seasonal amplitude is calculated as the difference between the 
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In the decade of 1961-1970, the average seasonal amplitude of Fra 
was 36.6 Pg Cyr’ '. It increased to 41.6 Pg Cyr‘ during 2001-2010 (Fig. 2 
inset). This amplitude increase appears mostly as an earlier and deeper 
drawdown of CO, during the spring/summer growing season. Using 
—15PgC yr, which is the mid-point of Fy, drawdown, as a thresh- 
old, we find that the growing season has lengthened by 14 days, with 
spring uptake of CO) occurring 10 days earlier. The annual mean Fr, is 
—1.6 PgCyr’ ' for 2001-2010, implying a net sink whose value is within 
the uncertainty range from global carbon budget analysis™*. This mean 
sink increased over the period, suggesting a relation between seasonal 
amplitude and the mean sink’. 

The temporal evolution of the seasonal amplitude of Fr, exhibits a 
long-term rise of 15% over 50 years, or 0.3% per year (Fig. 2 and Extended 
Data Table 2; also see Extended Data Fig. 3 for the detrended monthly 
time series). There are large decadal and interannual variabilities. The 
Mauna Loa Observatory CO, mixing ratio (CO2\10) shows a similar 
overall trend but differs from VEGAS on decadal timescales. Most notice- 
ably, a rise in CO2mr0 during 1975-1985 precedes a similar rise in VEGAS 
by several years. This rise was a focus of earlier research”. A major caveat 
is that the Mauna Loa Observatory CO, data are not directly compar- 
able with modelled Fy, because this single station is also influenced by 
atmospheric circulation, as well as fossil fuel emissions and ocean- 
atmosphere fluxes. The comparison is nonetheless valuable because the 
Mauna Loa Observatory data comprise the only long-term record, which 
is generally considered representative of global mean CO, (ref. 5). 

Wealso include in our comparison a global total CO, index (CO2¢,opat) 
and Fry, from three atmospheric inversions. The seasonal amplitude 
of CO2cropap Jena81 and VEGAS are similar but with some differ- 
ences in the early 1980s (Fig. 2). Otherwise they are similar to VEGAS, 
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maximum and the minimum of each year after detrending and band-pass 
filtering with a standard tool, CCGCRV (Extended Data Fig. 3). A 7-year 
bandpass smoothing removes interannual variability whose 1o standard 
deviation is shown for CO2mro (green shading) and VEGAS Fra (grey 
shading). The inset shows the average seasonal cycle of VEGAS Fr, for the two 
periods 1961-70 and 2001-2010, showing enhanced CO, uptake during the 
spring/summer growing season. 
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supporting the above interpretation oflocal influence in Mauna Loa Ob- 
servatory CO) data’. In contrast, if we consider only the period since 
1981, Mauna Loa Observatory CO; shows little trend because much of 
the increase occurred earlier, in the 1970s. A decrease in seasonal ampli- 
tude in the late 1990s is seen in all data, possibly owing to drought in the 
Northern Hemisphere mid-latitude regions””*. Similarly, there is con- 
sistency in the rapid increase in the first few years of the twenty-first 
century. In our view, the change in the seasonal CO, amplitude is best 
characterized as a relatively steady long-term increase, modulated by 
decadal variations, though it can alternatively be viewed as several periods 
of slow changes or even slight decreases punctuated by large episodic 
increases. 

We further analyse the spatial patterns underlying the seasonal ampli- 
tude of F;,. The latitudinal distribution of seasonal amplitude of Fr, 
(Extended Data Fig. 4) shows major contributions from Northern Hemi- 
sphere mid-high latitude regions 30° N-70° N, primarily driven by the 
large seasonal temperature variations there. The two subtropical zones 
centred at 10° N and 10° S have smaller but distinct seasonal cycles caused 
by the subtropical wet-dry monsoon-style rainfall changes. The South- 
ern Hemisphere between 40° S and 25° S has a clear seasonal cycle with 
the opposite sign to that of the Northern Hemisphere, but it is much 
smaller, owing to its smaller landmass. The atmospheric inversions also 
depict these broad features, in particular, the major peakin the Northern 
Hemisphere. VEGAS overestimates the seasonal amplitude between 
30° Nand 45° N compared to both inversions. Because of seasonal phase 
differences even within the same hemisphere, the latitudinal distribution 
does not automatically add up to the global total in the inset to Fig. 2; in 
particular, the Southern Hemisphere partially cancels out the Northern 
Hemisphere signal. 

Next, we examine the relative contributions of natural vegetation versus 
cropland in driving the rising seasonal amplitude of Fy4. We conducted 
a similar latitudinal analysis of modelled Fy, but separated cropland 
from natural vegetation, using a cropland mask for the year 2000. The 
results are shown in Fig. 3. Whereas the seasonal cycle is dominated by 
natural vegetation at high latitude, cropland is important in the latitude 
band from 25° N to 60° N, encompassing the world’s major agricultural 
lands of Asia, Europe and North America. Between 35° N and 45°N, 
the seasonal amplitude of Fr, on cropland is even higher than on natural 
vegetation. In the Southern Hemisphere, there is some contribution from 
cropland between 20° S and 40° S. A confounding factor is the contem- 
poraneous change in cropland area. However, a sensitivity experiment 
conducted using the cropland mask of 1961 yielded similar results. 

The seasonal amplitude increase between the two time periods 1961- 
1970 and 2001-2010 is clear both in the naturally vegetated area and in 
cropland area (Fig. 3). Over cropland, the seasonal amplitude increased 


nearly everywhere, while a major increase occurred in Northern Hemi- 
sphere natural vegetation between 50° N and 70° N. Because the model 
is forced by the three factors of climate, CO, and land-use changes, the 
seasonal amplitude increase in natural vegetation can come only from 
climate and CO). Between 25° N and 50°N, there is little amplitude 
change from natural vegetation, suggesting that the combined effect of 
climate and CO, is small there. This could be either because both effects 
are small, or because climate and CO, have opposite impacts that more 
or less cancel each other out. Because CO, fertilization likely enhances 
NPP and therefore CO, amplitude’, changes in climate may have had 
a negative impact on the mid-latitude natural vegetation. In contrast, 
the large Fy, seasonal amplitude change seen in cropland area between 
35° N and 55° N suggests that land use is responsible there, assuming 
that crops respond to the combined effect of climate and CO, in a way 
similar to natural vegetation in the same climatic zone. The spatial pattern 
of the NPP trend (Extended Data Fig. 5) shows the largest increase in 
the Northern Hemisphere agricultural belts of North America, Europe 
and Asia, supporting our interpretation that the intensification of agri- 
culture has a key role in Fr, seasonal amplitude change. 

It may seem surprising that cropland can have such a large impact, 
because crops are often considered less productive than the natural veg- 
etation they replace, though the opposite may be found for highly pro- 
ductive crops or on irrigated arid land”"*. However, for the impact on 
the CO, seasonal cycle, what matters most is that crops have a short but 
vigorous growing season, leading to a sharper peak and larger seasonal 
amplitude in GPP (Fig. 1c inset). A sensitivity experiment shows that 
land-cover change interacts with land management in a non-trivial way 
(Methods), but the contribution of crops to the increased seasonal ampli- 
tude is due mostly to higher crop productivity. Recent space-based mea- 
surements of sun-induced fluorescence”® (SIF) show vividly that at the 
height of the Northern Hemisphere growing season (July), cropland 
has the highest productivity, even more than the surrounding dense 
forests with similar climate conditions (Extended Data Fig. 6), an effect 
that is broadly captured by VEGAS, but in general not by the other three 
models analysed. 

To further delineate the relative contribution of climate, CO, fertili- 
zation and land use, we conducted three additional model experiments, 
termed CLIM, CO2 and LU, respectively. In each experiment, only one 
of the three forcings is used as model driver, while the other two are 
fixed. Figure 4 shows the evolution of Fr, seasonal amplitude, similar 
to Fig. 2, but with the fluxes from the three experiments added succes- 
sively. The sum of the three experiments is similar but not identical to 
the original simulation (ALL). We calculated the trend to be 0.088% per 
year for CLIM, 0.076% for CO2, and 0.135% for LU, corresponding to 
percentage contributions of 29%, 26% and 45% (Extended Data Table 2). 
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Figure 3 | Latitudinal distribution of the seasonal amplitude of Fr. Calculated separately for natural vegetation (green lines) and cropland (red lines), for the 


averages of two periods 1961-1970 (dashed) and 2001-2010 (solid). 
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Figure 4 | Attribution of causes with factorial 
analysis. Relative change of seasonal amplitude 
from three sensitivity experiments, each with a 
single forcing: climate only (CLIM, green), CO, 
only (CO2), and land use and management only 
(LU). The results from CO2 (blue) and LU (red) are 
added on top of CLIM sequentially. The ALL 
experiment (black) is the same as in Fig. 2, driven 
by all three forcings. 


1 T 1 
1985 1990 1995 


Year 


T T T T 
1965 1970 1975 1980 


The SUM of the three is 0.299% per year, or 3% per decade, or 15% 
over 50 years. Given uncertainties in the model and data (Methods and 
Extended Data Fig. 8), the quantitative attribution should be considered 
merely suggestive. In particular, VEGAS has a CO; fertilization strength 
that is weaker than in some other models that can account for the full 
amplitude change with fertilization alone’. A more challenging task 
would be to explain spatial patterns better, because models may signifi- 
cantly underestimate the high-latitude trend’” even if the global total is 
simulated correctly, the latter being the focus of this paper. Carbon cycle 
models may have a long way to go in explaining the long-term changes 
in the seasonal cycle’, but our results strongly suggest that intensifica- 
tion of agriculture should be included as a driver. 

It is generally known that land-use activities such as deforestation 
and intense agriculture tend to release carbon to the atmosphere, and 
that recovery from past land clearance sequesters carbon. Our study here 
suggests yet another aspect of human impact on the global carbon cycle: 
the basic seasonal characteristics of the biosphere, as indicated by atmo- 
spheric CO}, have been modified by human land-management activities. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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Direct human influence on atmospheric CO, 
seasonality from increased cropland productivity 
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Ground- and aircraft-based measurements show that the seasonal 
amplitude of Northern Hemisphere atmospheric carbon dioxide 
(CO,) concentrations has increased by as much as 50 per cent over 
the past 50 years’ °. This increase has been linked to changes in tem- 
perate, boreal and arctic ecosystem properties and processes such as 
enhanced photosynthesis, increased heterotrophic respiration, and 
expansion of woody vegetation* °. However, the precise causal mech- 
anisms behind the observed changes in atmospheric CO, seasonal- 
ity remain unclear’ *. Here we use production statistics and a carbon 
accounting model to show that increases in agricultural productivity, 
which have been largely overlooked in previous investigations, explain 
as much as a quarter of the observed changes in atmospheric CO 
seasonality. Specifically, Northern Hemisphere extratropical maize, 
wheat, rice, and soybean production grew by 240 per cent between 
1961 and 2008, thereby increasing the amount of net carbon uptake 
by croplands during the Northern Hemisphere growing season by 
0.33 petagrams. Maize alone accounts for two-thirds of this change, 
owing mostly to agricultural intensification within concentrated pro- 
duction zones in the midwestern United States and northern China. 
Maize, wheat, rice, and soybeans account for about 68 per cent of extra- 
tropical dry biomass production, so it is likely that the total impact of 
increased agricultural production exceeds the amount quantified here. 

Changes in the seasonality of Northern Hemisphere atmospheric CO, 
concentrations were first noted three decades ago using data from atmo- 
spheric monitoring sites at Mauna Loa, Hawaii and Barrow, Alaska’’*. 
Parallel evidence from remote sensing, ecosystem models, and eddy 
covariance measurements have established that Northern Hemisphere 
extratropical growing seasons have become longer, with concomitant 
changes in species composition, photosynthetic activity, and ecosystem 
respiration in boreal and arctic terrestrial ecosystems**”. Hence, to explain 
observed increases in CO) seasonality, most studies have focused on the 
role of climate-induced changes to the terrestrial biosphere in Northern 
Hemisphere mid- to high latitudes***. 

Graven et al.* recently compared Northern Hemisphere atmospheric 
CO, concentrations collected from aircraft around 1960 with similar mea- 
surements collected around 2010. Their results not only confirm pat- 
terns observed from ground stations, but also reveal a strong latitudinal 
gradient in changes to the amplitude of CO) seasonality, with measure- 
ments collected over boreal and arctic regions showing larger increases 
than measurements collected at lower latitudes. On the basis of the shape 
of the seasonal CO, cycle at higher latitudes, Graven et al.’ suggested that 
longer growing seasons are insufficient to explain the observed changes 
in atmospheric CO, seasonality, and that enhanced uptake of CO, during 
the middle of the growing season must also be occurring. Consistent with 
these results, our analyses show that changes in mid-latitude cropland 
production, with shorter and more intense carbon uptake periods than 
natural ecosystems"°, and where crop-specific yields have increased by 
as much as 300% over the past 50 years" (Fig. 1), explain a large and 


previously unrecognized proportion of increases in the seasonality of 
Northern Hemisphere atmospheric COo. 

Maize, wheat, rice, and soybeans (MWRS) account for about 64% of 
global caloric consumption” and 58% of global dry biomass produc- 
tion. The bulk of this production occurs in extratropical regions where 
MWRES represents an even larger share of dry biomass production (68%; 
Extended Data Tables 1 and 2), and where production has increased 
240% since 1965. Remarkably, the harvested area of extratropical MWRS 
increased less than 18% over this time period, reflecting the fact that pro- 
duction increases were overwhelmingly associated with more produc- 
tive agricultural practices rather than expansion of cultivated area’’. 
Specifically, higher yields were facilitated by development and adoption 
of improved cultivars and management practices in combination with 
technological advances, particularly in irrigation and fertilization’***". 

To quantify the contribution of croplands to changes in atmospheric 
COQ, seasonality, we developed a carbon accounting methodology that 
uses gridded time series of MWRS production statistics'* to calculate 
MWRS net ecosystem production (NEP) during annual carbon uptake 
and carbon release periods (CUP and CRP) for the Northern Hemi- 
sphere extratropical zones defined by Graven et al.* (see Methods). In 
total, extratropical MWRS net primary production (NPP) increased by 
0.88 petagrams of carbon (Pg C) between 1961 and 2008, which corre- 
sponds to an additional 648 million tonnes of annually harvested biomass. 
However, since the growing periods for MWRS are not completely in 
phase with the primary Northern Hemisphere atmospheric CUP (espe- 
cially in areas supporting multiple cropping and winter wheat), roughly 
one-quarter of total MWRS productivity occurs during the CRP, thereby 
mitigating the net impact of total changes in cropland productivity on 
the seasonality of atmospheric CO3. 

After accounting for the proportions of uptake and release within the 
CUP and CRP (see Methods), we estimate that changes in Northern 
Hemisphere extratropical MWRS production increased NEP during 
the CUP by 0.33 Pg. Since we assume that this carbon is returned to the 
atmosphere during the CRP, the net effect is an increase in seasonal 
biosphere—atmosphere carbon exchange of 0.66 Pg C (95% confidence 
interval 0.49-0.90), from 0.25 Pg C in 1961 to 0.91 Pg C in 2008, a rate 
of roughly 14 teragrams per year (Tg yr‘) (Fig. 2a). Graven et al. used 
inverse modelling to quantify the change in seasonal carbon exchange 
over the same period. Their estimate of 1.3-2.0 Pg C is the additional 
“seasonal net carbon transfer” (defined as half the sum of carbon assim- 
ilated in the CUP and carbon released in the CRP ina net neutral system) 
over all extratropical lands that is necessary to replicate the observed 
seasonality enhancement in the atmospheric CO} record, accounting 
for transport and mixing processes. Thus, our results indicate that 
changes in extratropical production of MWRS accounts for 17%-25% 
of the enhanced carbon exchange needed to explain the increasing sea- 
sonal amplitude of Northern Hemisphere atmospheric COp. 

Although increases in extratropical MWRS productivity have occurred 
throughout the Northern Hemisphere (Fig. 3), 88% of the enhanced 
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Figure 1 | Latitudinal patterns of increased crop production. Average gridded production values were summed over one-degree latitudinal bands for three-year 
intervals centred on 1965 and 2005 for maize (a), wheat (b), rice (c), soybeans (d) and MWRS (e). 
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Figure 2 | Attributing the enhanced seasonality. Annual contributions of 
Northern Hemisphere extratropical MWRS production to atmospheric CO 
seasonality Sco2mwrs from 1961 to 2008 with 95% confidence intervals 
(quantiles from 10° iterations) (a), contributions to the total increase by 

crop (b), and by region (c; see Extended Data Fig. 5). a shows a linear fit with 
a slope of 14TgCyr '. 


seasonal carbon exchange due to increased MWRS production is asso- 
ciated with changes in North America (46%, mostly in the United States) 
and East Asia (42%, mostly in China), where maize is the dominant crop 
(Figs 2c and 3; Table 1). Further, even though wheat and maize account 
for similar proportions of total contemporary extratropical MWRS pro- 
duction (34% and 43%, respectively), maize accounts for over 66% of 
the total change in atmospheric CO; seasonality attributable to crop- 
lands (Table 1; Fig. 2b). In contrast, wheat explains only 9% of the total 
change because a substantial proportion of wheat production occurs 
outside the atmospheric CUP (Extended Data Table 3). Rice accounts 
for the second largest contribution to increased seasonality (14%; Table 1). 
However, like wheat, the impact of rice on CO) seasonality forcing is 
relatively minor because a substantial proportion of total rice produc- 
tion occurs outside of the CUP. The role of soybeans is also fairly modest, 
accounting for 11% of the crop-induced increase in CO seasonality 
forcing (Fig. 2b). 

Crop-specific geographic patterns in MWRS production strongly influ- 
ence the relative contribution of different regions to total forcing on atmo- 
spheric CO, seasonality by croplands. Europe, for example, accounts 
for 38% of contemporary extratropical wheat production and 20% of 
total extratropical MWRS production, but contributed only 11% to the 
increase in CO seasonality associated with increased MWRS produc- 
tion (Figs 2c and 3). Total MWRS production is low throughout cen- 
tral Eurasia (Fig. 3), accounting for only 6% of total contemporary 
extratropical MWRS production. Further, because winter wheat is the 
dominant crop in this region, central Eurasia accounts for only 2% of 
the total change in CO seasonality attributable to agriculture (Fig. 2c; 
Table 1). These results highlight the profound impact that increases in 
North American and Chinese maize production have had on seasonal 
carbon budgets of the extratropical Northern Hemisphere. 

One of the most remarkable aspects of the changes in cropland pro- 
ductivity we report here is that land used for MWRS production cur- 
rently occupies less than 6% of vegetated land areas in the extratropical 
Northern Hemisphere’’. Thus, increases in CO; seasonality associated 
with MWRS production are being driven almost exclusively by crop 
management practices and improved genetics that have profoundly trans- 
formed the seasonal carbon budgets of intensively managed agroeco- 
systems. Increases in extratropical MWRS production over the past 
50 years exceed 240%, whereas model inversions and atmospheric CO, 
records imply that total uptake by terrestrial ecosystems during the extra- 
tropical Northern Hemisphere growing season increased only 40%-60% 
during the same period’. Hence, our results indicate that management 
of agricultural ecosystems occupying a relatively small proportion of 
land area has had an outsized impact on the seasonality of Northern 
Hemisphere atmospheric CO. Further, most of this contribution occurred 
in two key regions (northern China and the midwestern USA) via enor- 
mous increases in production of a single crop: maize. 

Many of the technologies enabling production increases are energy 
intensive, and are therefore sources of greenhouse gases (for example, 
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Figure 3 | Increased production and seasonality. Geographic patterns of 
increases in Northern Hemisphere extratropical MWRS production (P) from 
1961-2008 (left), and the resulting increase in forcing to atmospheric CO2 


fertilizer production, transportation, farm mechanization, and irrigation)’*. 
However, CO, emissions associated with these technologies are rela- 
tively aseasonal, and increases in these emissions over the last 50 years 
are much smaller than changes in seasonal assimilation of CO, arising 
from increased crop productivity’’”. Similarly, alternative crop residue 
management practices (for example, no-till) can alter long-term crop- 
land soil carbon source-sink dynamics”, but have relatively little impact 
on the seasonality of carbon budgets. Hence, seasonal changes in CO2 
emissions arising from changes in farming technology and practices are 
small compared to those associated with changes in crop productivity. 

Our analysis focused on MWRS because these four crops are the most 
important and geographically extensive food crops on the planet, and 
because there are high-quality, global, gridded time series available that 
allowed us to calculate crop-specific and spatially explicit MWRS NEP”. 
In doing so, however, our analysis excluded roughly 32% of Northern 
Hemisphere extratropical crop dry-biomass production. Since a large 
proportion of this unaccounted production occurs in crops with sea- 
sonal assimilation patterns that are largely in phase with the Northern 
Hemisphere CUP, it is likely that the total forcing on atmospheric CO, 
seasonality due to cropland intensification exceeds the contribution 
from MWRS alone, perhaps substantially so. 

Current Earth system models do not replicate observed changes in 
atmospheric CO, seasonality*”’. The results presented here suggest that 
poor representation of agroecosystems within these models explains 
a substantial proportion of this problem. Indeed, recent results from 
satellite-borne Sun-induced fluorescence measurements show that both 
process-based and data-driven models significantly underestimate GPP 
in croplands, with errors as large as — 75% in intensively cultivated areas 
such as the midwestern USA and the North China plain’®. Improved 
representations of contemporary farming practices (fertilization, irri- 
gation, herbicide/pesticide application), multiple cropping, the impact 


Table 1 | Percentage of increased extratropical MWRS seasonal carbon 
exchange by crop and region 


Crop East Asia North America Europe Central Asia Total 
Maize 24 35 7 <l 66 
Wheat 3 1 3 1 9 
Rice 14 <l <1 <l 14 
Soybeans 2 9 = <1 cn 
Total 42 46 11 2 100 
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seasonality (right). Values are shown as sums within 1° X 1° grid cells for 
illustration, but analyses were conducted at 0.05° X 0.05° grid resolution. Cells 
with values <0.1 Tg C are not shown (see Extended Data Fig. 6). 


of weeds, pests and diseases on crop physiology and yields, and the higher 
tolerance of newer cultivars and hybrids to stresses (for example, drought 
tolerance, flooding) are therefore required for Earth system models to 
capture geographically and seasonally dependent variations in crop- 
land carbon budgets. In addition to improved process representations, 
improved data sets that provide spatially and temporally resolved infor- 
mation regarding cropland management practices are also needed. 

Numerous studies have documented changes in the Northern Hemi- 
sphere biosphere over the past several decades***!’, but few have explicitly 
considered the linkage between these changes and increased atmospheric 
CO, seasonality. Changing terrestrial source-sink dynamics related to 
CO; fertilization, growing season length extension, enhanced assimila- 
tion/respiration, and biome expansion has been invoked as a primary 
mechanism leading to the increased atmospheric CO; seasonality'*”*”». 
Analysis of global carbon budgets point to an increased land sink over 
the past half-century, although the location of this sink, and the causal 
mechanisms behind it remain unclear**”’. Although it is not incon- 
sistent with these studies, our analysis demonstrates that a substantial 
portion of increased CO, seasonality results from a process that is roughly 
neutral in terms of its impact on the terrestrial carbon sink. Thus, care 
must be taken when making inferences regarding the causal linkages 
between CO, seasonality and terrestrial carbon sink dynamics. 

By identifying a large and previously unrecognized mechanism that 
affects atmospheric CO, concentrations, the results reported here illu- 
minate an important anthropogenic impact on global carbon budgets, 
and reveal another pathway through which humans are fundamentally 
altering the Earth system. In the coming decades, climate change impacts 
on natural ecosystems are likely to continue, leading to ongoing (and 
possibly accelerating) intensification of the seasonal cycle of atmospheric 
CO,. In parallel, current projections suggest that global food production 
will need to nearly double over the next 50 years’*”*, requiring concom- 
itant increases in cropland productivity, and by extension, imposing an 
even stronger signature of human activities in atmospheric CO. 


Online Content Methods, along with any additional Extended Data display items 


and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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Topologically associating domains are stable units of 
replication-timing regulation 
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Michael P. Snyder’, John A. Stamatoyannopoulos®, James Taylor®+, Ross C. Hardison®, Tamer Kahveci'®, Bing Ren" 
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Eukaryotic chromosomes replicate in a temporal order known as the 
replication-timing program’. In mammals, replication timing is cell- 
type-specific with at least half the genome switching replication timing 
during development, primarily in units of 400-800 kilobases (‘repli- 
cation domains’), whose positions are preserved in different cell types, 
conserved between species, and appear to confine long-range effects 
of chromosome rearrangements” ’. Early and late replication corre- 
late, respectively, with open and closed three-dimensional chromatin 
compartments identified by high-resolution chromosome conforma- 
tion capture (Hi-C), and, to a lesser extent, late replication correlates 
with lamina-associated domains (LADs)***”. Recent Hi-C mapping 
has unveiled substructure within chromatin compartments called topo- 
logically associating domains (TADs) that are largely conserved in 
their positions between cell types and are similar in size to replica- 
tion domains®*!°. However, TADs can be further sub-stratified into 
smaller domains, challenging the significance of structures at any 
particular scale”. Moreover, attempts to reconcile TADs and LADs 
to replication-timing data have not revealed a common, underlying 
domain structure*”’. Here we localize boundaries of replication 
domains to the early-replicating border of replication-timing tran- 
sitions and map their positions in 18 human and 13 mouse cell types. 
We demonstrate that, collectively, replication domain boundaries 
share a near one-to-one correlation with TAD boundaries, whereas 
within a cell type, adjacent TADs that replicate at similar times obscure 
replication domain boundaries, largely accounting for the previously 
reported lack of alignment. Moreover, cell-type-specific replication 
timing of TADs partitions the genome into two large-scale sub-nuclear 
compartments revealing that replication-timing transitions are indis- 
tinguishable from late-replicating regions in chromatin composition 
and lamina association and accounting for the reduced correlation of 
replication timing to LADs and heterochromatin. Our results recon- 
cile cell-type-specific sub-nuclear compartmentalization and replica- 
tion timing with developmentally stable structural domains and offer 
a unified model for large-scale chromosome structure and function. 

Measurements of replication timing in human and mouse reveal chro- 
mosome segments with relatively uniform replication timing (constant 
timing regions, CTRs), mediated by clusters of near-synchronous initia- 
tion events that are heterogeneous in location from cell to cell and appear 
to fire through a stochastic mechanism“. Despite stochastic origin firing, 
CTRs are interrupted at reproducible locations by transitions between 
early and late replication called timing transition regions (TTRs; Fig. la). 
We mapped TTRs in 35 mouse and 31 human data sets as part of the 


Mouse ENCODE project consortium’. Replication timing of early TTR 
borders clustered better than late (Extended Data Fig. 1a), suggesting 
that initiation events defining early borders are coordinated, whereas 
events defining late borders are less synchronized, possibly resulting from 
passive fork fusion’. To investigate a possible relationship between TTRs 
and TADs (Supplementary Discussion), we aligned mouse embryonic 
stem cell (mESC) TTRs (Fig. 1b) and compared them to the direction- 
ality index used to define TAD boundaries (transitions from upstream 
to downstream interaction bias)*. A single shift from upstream to down- 
stream bias occurred within 500 kilobases (kb) of the average TTR, located 
near the aligned early border. Examination of individual TTRs indi- 
cated that TAD boundaries typically isolated early CTRs from TTRs, 
whereas TTRs and neighbouring late CTRs predominantly belonged to 
the same TAD (Fig. 1c and Extended Data Fig. 1b, c). Similarly, transi- 
tions between Hi-C compartments exhibited preferential TAD bound- 
ary alignment to the border of the compartment associated with early 
replication (“compartment A’; Extended Data Fig. 1d). Hence, early TTR 
borders separate TADs within compartment A from TADs within a com- 
partment interaction gradient’® along TTRs, whereas late TTR borders 
have no detectable relationship to TAD structure. 

Examination of replication timing across TADs (Fig. le) revealed, with 
few exceptions, that TADs were entirely early or late replicating, spanned 
all or part of a single TTR, or contained converging TTRs that consti- 
tute the previously described U-shaped replication-timing domains”. 
Replication-timing patterns across LADs were remarkably similar except 
that LADs exclusively replicated during mid to late S phase (Fig. le), and 
TADs that replicated early versus late exhibited clearly distinct levels of 
lamina association (Extended Data Fig. 2a-c). Consistent with observa- 
tions that TTRs associate with the nuclear lamina more frequently than 
CTRs with similar replication timing”’, we observed lamina associa- 
tion within late-replicating regions and TTRs (Extended Data Fig. 2d, e), 
explaining the modest correlation of LADs to replication timing. Although 
30% of TTRs did not overlap with a computationally called LAD, these 
TTRs still associated with the nuclear lamina to some degree (Extended 
Data Fig. 2f) and may interact preferentially with other repressive sub- 
nuclear compartments'*’". Together, these results revealed that TTRs 
resemble late-replicating regions with no discontinuity at late TTR bor- 
ders, whereas early TTR borders are strong candidates for the structural 
boundaries of replication domains. 

Localizing the replication domain boundary to early TTR borders 
(hereafter referred to as replication domain boundaries) prompted us to 
devise a more precise algorithm to map replication domain boundaries. 
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Figure 1 | Early timing transition region borders align with topologically 
associating domains and lamina associated domains. a, Constant replication 
timing segments (CTRs) flanking a timing transition region (TTR) are 
illustrated. b, The average and range of 8,433 aligned TTRs from 5 mESC data 
sets (top). Vertical axis values are log, ratios of early over late signal intensities, 
with more positive values indicating earlier replication timing (and more 
negative values indicating later timing). Average directionality index values 
across the same TTRs (bottom). Transition from upstream to downstream bias 
indicates a topologically associating domain (TAD) boundary near the early 
border. c, Individual aligned TTRs arranged by distance between early or late 
borders and upstream to downstream bias transitions. d, Replication timing 
across individual mESC TADs or lamina associated domains (LADs). UD, 
U-shaped replication-timing domains. 


We included replication-timing data generated by Repli-seq (see Methods 
for details), and other human data sets for a total of 42 human data sets 
(Extended Data Table 1). We compared calls from replicate data sets to 
measure the technical variability with which replication domain bound- 
aries were defined using our methods (Extended Data Fig. 3). Since both 
Repli-chip (microarray analysis, see Methods for details) and Repli-seq 
protocols analyse cell populations and use replicated fragments that are 
several hundred kilobases (due to labelling time), differences in the breadth 
and depth of sequencing or array data point spacing along the chromo- 
some have little effect on resolution**. Accordingly, Repli-chip and Repli- 
seq data from the same cell types demonstrated a high degree of overlap 
between calls (Extended Data Fig. 3). 

To determine the stability of replication domains during development, 
we generated a list of unique replication domain boundaries and classi- 
fied each boundary as either “[TR-present’ or “TTR-absent’ in each avail- 
able cell type (Fig. 2a). By examining the overlap of TAD boundaries 
with the compiled list of replication domain boundaries, we found that 
nearly all TAD boundaries corresponded to a replication domain bound- 
ary (Fig. 2b). Importantly, a majority corresponded to replication domain 
boundaries that were TTR-absent in cells where the TADs were mapped 
(IMR90 cells), supporting the conclusion that TADs are stable during 
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Figure 2 | TADs align with TTRs from different cell types. a, Illustrated 
examples of one TTR-present and one TTR-absent replication domain (RD) 
boundary. b, Percentage of IMR90 TAD boundaries overlapping TTR-present 
or all replication domain boundaries. c, Probability density functions for 
IMR90 TAD boundaries and average IMR90 replication-timing profiles across 
replication domain boundaries. Mean and 3 standard deviations from the mean 
random density are indicated. d, Replication timing (top), 4C (middle), and 
directionality index (bottom) across the Dppa2 locus in mouse ESCs and NPCs. 
e, Replication timing across a chromosome rearrangement and the normal 
profile with the nearest TAD boundary indicated. 


development and function as replication domains. The fraction of TAD 
boundaries that did not align with any replication domain boundary is 
expected due to the portion of the genome with constitutive replication 
timing in the cell types for which data were available. Although nearly 
all TAD boundaries corresponded to replication domain boundaries, the 
reciprocal comparison indicated that many replication domain bound- 
aries did not coincide with a corresponding TAD boundary (Extended 
Data Fig. 4). Although alignments of either TTR-present or TTR-absent 
replication domain boundaries to TAD boundaries were statistically sig- 
nificant (Fig. 2c), alignment to TTR-absent replication domain bound- 
aries was not as strong (Fig. 2c), explained by incomplete TAD annotation 
and the observation that small TTRs lack a detectable relationship with 
TADs (Extended Data Fig. 5 and Supplementary Discussion). 

To corroborate TAD stability across cell types, we also compared TAD 
calls to high-resolution chromosome conformation capture-on-chip (4C) 
interaction frequency data across a replication domain that switches rep- 
lication timing during mouse ESC differentiation to neural precursors”. 
In ESCs, where TTRs flank this domain, TAD boundaries and marked 
decreases in 4C interaction frequency are apparent near both replica- 
tion domain boundaries (ESC panels in Fig. 2d). However, in differen- 
tiated cells, where the replication domain is replicated at the same time 
as its neighbours, a TAD boundary is no longer called at the leftmost 
replication domain boundary, even though a sharp decrease in interac- 
tion frequency is detected by the higher-resolution 4C (NPC and cortex 
panels in Fig. 2d). Thus, the TAD boundary at this cell-type-specific TTR 
is stable during differentiation even though it is not identified as such 
by this Hi-C data set, providing additional evidence that TAD annota- 
tion is incomplete. To demonstrate the functional relationship between 
TADs and replication domains, we also compared the positions of TADs 
to replication-timing shifts observed previously at points of chromo- 
some rearrangement’. Figure 2e shows a rearrangement that joined 
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otherwise early- and late-replicating regions. In this example, early 
replication appears to have spread into the late region up to a point 
that coincides with the nearest TAD boundary, where a new TTR was 
formed. Similar results were observed for additional examples (Extended 
Data Fig. 6). Taken together, these results provide compelling evidence 
that TADs act as stable units of replication-timing regulation during 
development. 

To identify candidate factors involved in the developmental regulation 
of replication domains, we next compared replication domain bound- 
aries to histone modifications, transcription factor binding sites, and 
DNase I hypersensitive sites (DHS) mapped by the ENCODE consortia®”’. 
We aligned over 200 chromatin features to TTR-present replication 
domain boundaries in 7 mouse and 13 human cell types and found that 
only LAD boundaries were highly enriched in all the cell types where 
data were available (Fig. 3a, b and Extended Data Fig. 7). Notably, SUZ12 
is a component of the Polycomb repressive complex 2 responsible for 
the H3K27me3 modification’, and both SUZ12 and H3K27me3 were 
enriched at TTR-present replication domain boundaries in ESCs (Fig. 3a 
and Extended Data Fig. 7). However, strong enrichment was not observed 
in all cell types. Moreover, analysis of replication timing in Suz12 knock- 
out mESCs, which exhibit global loss of H3K27me3 (refs 25, 26), showed 
no significant differences in replication timing relative to a wild-type 
control (R = 0.95). 

Previously, we and others reported enrichment of other marks at 
early TTR borders (DHS”’; CCCTC-binding factor (CTCF)"”) or nearby 
(~100 kb inside early CTRs) (H3K4me1/2/3, H3K36me3, and H3K27ac’). 
Enrichment peaks for these marks were broad and extended into the 
neighbouring early regions (Fig. 3b and Extended Data Fig. 7), indicat- 
ing that these properties are enriched within early regions®, and parti- 
tioned at the replication domain boundary, but we found no evidence 
to suggest that these individual marks are locally enriched at replication 
domain boundaries in all cell types. Consistent with the enrichment of 
these marks throughout early regions, combinatorial analysis of histone 
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Figure 3 | TTR-present replication domain boundaries separate permissive 
and repressed chromatin domains. a, b, Probability density functions for 
chromatin features and replication timing across mESC TTR-present 
replication domain boundaries. c, Chromatin states across the same 
boundaries. d, True versus predicted classification rates comparing the 
predicted classes of an unsupervised model trained on binding profiles for 
seven transcription factors (CTCF, HCFC1, MAFK, P300, RNA Pol II, 
ZC3H11A, and ZNF384) versus actual replication timing for all mESC TADs. 
TADs considered ‘early’ by replication timing predominantly composed class 
A, whereas “TTR’ and ‘late’ TADs predominantly composed class B. TFBS, 
transcription factor binding sites data. 
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modifications (H3K4mel1/3, H3K27me3, H3K36me3) revealed a rela- 
tively abrupt transition near replication domain boundaries between 
broad regions with either transcriptionally active or repressive chromatin 
marks (Fig. 3c), providing further evidence that “I'TR-present’ replica- 
tion domain boundaries partition chromatin states. We also previously 
reported enrichment of short-interspersed nuclear elements (SINEs) at 
TAD boundaries’, but this apparent enrichment at boundaries was due 
to differential enrichment among TADs (Extended Data Fig. 8 and Sup- 
plementary Discussion). Similarly, densities of several DNA repeats and 
motifs were partitioned at replication domain boundaries and transi- 
tions in nucleotide skew (“N-domain’ boundaries**) were enriched near 
replication domain boundaries (Extended Data Fig. 7). Metazoan genomes 
have been segmented into a manually selected number of chromatin 
classes” that correlate with replication timing’’. By combining data for 
seven factors (CTCF, HCFC1, MAFK, P300, RNA Pol II, ZC3H11A, 
and ZNF384), we assigned each TAD into classes using an unsupervised 
approach (Supplementary Discussion). We obtained two TAD classes, 
termed A and B, indicating the presence of clearly recognizable differ- 
ences in the transcription factor composition of these classes, as well as 
clear similarities within each class. Class A corresponded to early TADs, 
whereas class B corresponded to TADs within either TTRs or late regions 
(Fig. 3d), with an overall error rate of 16%. The relatively high enrich- 
ment of HCFC1, MAFK, and RNA polymerase II within early versus late 
replication domains may account for the classes (Extended Data Fig. 9). 
Similar composition of TTRs and late CTRs provides further evidence 
that these regions are equivalent and are replicated differently based on 
their proximity to early replication domains. 

Our results support a unifying model in which TADs are stable reg- 
ulatory units of replication timing (Fig. 4). In this ‘replication-domain 
model’, DNA synthesis begins within TADs that reside in the nuclear 
interior and contain features permissive for transcription. Meanwhile, 
replication gradually advances into adjacent later-replicating TADs that 
reside at the nuclear periphery or other repressive compartments and 
contain features associated with repressed transcription. This gradual 
progression forms a TTR that extends from the boundary separating 
early and late TADs to a context-dependent point (that is, independent 
of TAD structure, Extended Data Fig. 6a) determined by replication 
rate and time elapsed before replication origins throughout adjacent 
later-replicating TADs and the resulting forks merge. Similarly, TADs 
replicated by active origin firing in mid S phase form TTRs that extend 
into adjacent later-replicating TADs (Extended Data Fig. 6a). By con- 
trast, timing transitions do not form at boundaries between adjacent 
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Figure 4 | The replication domain model. Top left, replication timing across 
three TADs replicated late in cell type 1. Early initiation of flanking regions 
forms TTRs that extend from the left and right boundaries of TADs 1 and 3 
respectively until origins throughout the late-replicating region fire. Top right, 
TADs 1-3 arrange in transcriptionally repressive compartments of the nucleus. 
Bottom left, in cell type 2, TAD2 is replicated early, creating new TTRs at 
pre-existing TAD boundaries. Bottom right, the switch to early replication is 
associated with diminished interaction with the nuclear lamina and increased 
interaction with other early-replicating TADs. 
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TADs residing in the same compartment due to coincidence of initia- 
tion events within their structural boundaries. Upon differentiation, 
TADs that switch replication timing acquire features associated with 
their new sub-nuclear compartment while their preexisting structural 
boundaries establish new compartment boundaries. The demonstra- 
tion that TADs are units of regulation reveals an important organiza- 
tional principle of mammalian genomes and represents a critical step 
towards understanding mechanisms regulating replication timing. Deter- 
mining whether replication timing dictates chromatin structure within 
TADs to influence chromatin interactions or vice versa will be an impor- 
tant area of future investigation. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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The drivers of tropical speciation 
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Since the recognition that allopatric speciation can be induced by 
large-scale reconfigurations of the landscape that isolate formerly 
continuous populations, such as the separation of continents by plate 
tectonics, the uplift of mountains or the formation of large rivers, land- 
scape change has been viewed as a primary driver of biological diver- 
sification. This process is referred to in biogeography as vicariance’. 
In the most species-rich region of the world, the Neotropics, the sun- 
dering of populations associated with the Andean uplift is ascribed 
this principal role in speciation”. An alternative model posits that 
rather than being directly linked to landscape change, allopatric spe- 
ciation is initiated to a greater extent by dispersal events, with the 
principal drivers of speciation being organism-specific abilities to 
persist and disperse in the landscape®’. Landscape change is not a 
necessity for speciation in this model*. Here we show that spatial and 
temporal patterns of genetic differentiation in Neotropical birds are 
highly discordant across lineages and are not reconcilable with a model 
linking speciation solely to landscape change. Instead, the strongest 
predictors of speciation are the amount of time a lineage has persisted 
in the landscape and the ability of birds to move through the land- 
scape matrix. These results, augmented by the observation that most 
species-level diversity originated after episodes of major Andean uplift 
in the Neogene period, suggest that dispersal and differentiation on 
a matrix previously shaped by large-scale landscape events was a major 
driver of avian speciation in lowland Neotropical rainforests. 

In the species-rich Neotropics, the origins of biodiversity are usually 
linked to changes to the Earth’s landscape over geological time” >?". Pal- 
aeogeographic studies indicate that Andean mountain building during 
the Neogene catalysed tumultuous changes in the lowlands, including 
formation of the Amazon River system, closure of the Isthmus of Panama, 
and the isolation of humid lowland forests east and west of the Andes 
by montane habitats and the aridification of the Caribbean lowlands in 
northern South America*. These large-scale landscape changes are hypoth- 
esized to have driven speciation by fragmenting species distributions that 
were formerly continuous, a process that can generate congruent spatial 
and temporal patterns of genetic differentiation in co-distributed lineages, 
especially for lineages with similar ecological characteristics. Bolstering 
support for the importance of landscape change driving isolation in this 
region, time-calibrated phylogenies of a taxonomically diverse group of 
organisms encompassing a broad range of ecologies and dispersal abil- 
ities indicate that many modern Neotropical lineages originated during 
time periods associated with major reconfigurations of the landscape, 
presumably signifying a shared response to landscape history”. 

An alternative hypothesis is that the principal effect of Andean mountain 
building in the Neogene on speciation was the formation of a geographi- 
cally structured landscape matrix upon which subsequent diversification 


occurred. Within the humid lowland forests of the Neotropics the landscape 
contains mountains and rivers that restrict the movement of individuals 
across them (we use the term dispersal for these movements). Under this 
model, lineages with a longer occupation of the landscape have a 
higher likelihood of dispersing across geographical barriers and diver- 
sifying. In addition, lineages with lower dispersal ability are expected to 
accrue genetic differentiation between populations at a relatively higher 
rate than more dispersive lineages, leading to a higher rate of speciation’. 
In this model, lineage-specific attributes are predicted to be the primary 
determinants of species diversity within lineages". 

These two models of diversification in the Neotropics have been dif- 
ficult to evaluate empirically because: (1) large-scale comparative data 
are needed from multiple co-distributed lineages; (2) each lineage needs 
to be sampled densely across its range to identify phylogeographic breaks 
and to estimate within-lineage species diversity; (3) the sampled lineages 
must encompass a range of quantifiable dispersal abilities and ecological 
guilds in order to test how these variables affect speciation; and (4) the 
phylogenetic position of each lineage must be known to approximate lin- 
eage age. We assessed the relative support for these two models in explain- 
ing standing species-level variation by characterizing recent large-scale 
diversification using a comparative phylogeography data set containing 
over 2,500 individuals from 27 widespread bird lineages in the species- 
rich Neotropics (Supplementary Table 17 and Figs 1 and 2). Biological spe- 
cies often represent an inaccurate estimate of the true diversity in avian 
rainforest communities because the alpha taxonomies of most groups still 
require formal revision using modern methods. To minimize biases asso- 
ciated with species limits based on current taxonomy, we defined each 
lineage as all populations of a given taxon that represent, on the basis of 
available evidence, a monophyletic group, regardless of whether the lineage 
is currently treated as a single species or as a species complex that includes 
several closely related species. By examining relatively recent diversifica- 
tion at the phylogeographic scale, where extinction is less likely to have 
occurred, we minimized the confounding effects of extinction. Extinction 
is difficult to account for analytically and typically increases with time’. 

The Andes, the Isthmus of Panama and large rivers of the Amazon 
Basin (the Amazon, Madeira and Negro rivers) are prominent features 
of the Neotropical landscape that interrupt the distributions of the 27 
focal lineages to varying degrees (Fig. 1 and Supplementary Figs 1-27). 
The effect of the landscape on diversification is evident taxonomically, 
with distinct taxa usually located on opposite banks of Amazonian rivers, 
the Isthmus of Panama and the Andes. Biogeographers often treat regions 
delimited by these dispersal barriers as areas of endemism because of 
the accumulation within them of distinct taxa having common distri- 
butional ranges (Extended Data Fig. 1). The exact time of origin of the 
dispersal barriers separating these areas is debated**'*'°, but most data 
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Figure 1 | Sampling within the landscape matrix. Sampling points of the 27 
bird lineages (circles) and prominent dispersal barriers within the landscape 
matrix, including the Andes (and associated arid habitats in the Caribbean 
lowlands of South America), the Isthmus of Panama and three major rivers in 
the Amazon Basin (Amazon, Negro and Madeira Rivers). 


indicate that they achieved their modern configuration during the Neogene 
(23-2.6 million years (Myr) ago)*. Subsequent landscape changes during 
the Quaternary period (2.6 Myr ago to present) were marked by fluctua- 
tions in forest cover driven by glacial—interglacial cycles*””, but Amazonia 
remained forested even during the cooler and drier glacial periods”. 
Genealogies of the 27 lineages exhibited substantial variation in the 
timing and spatial sequence of diversification associated with barriers 
(Fig. 3a, Supplementary Figs 1-27 and Supplementary Table 17). To test 
whether divergence events across the major dispersal barriers structuring 
these genealogies were consistent with a single episode of vicariance asso- 
ciated with barrier formation we used hierarchical approximate Bayesian 
computation (hABC)”’, which is able to account for differences in genetic 


Figure 2 | Gene tree composed of 27 lineages of Neotropical birds, with 
species at tips inferred using a Bayesian coalescent model. An exemplar 
taxon for each lineage is illustrated*®. Yellow bars correspond to the 95% highest 
posterior density for divergence times of each species. The Quaternary (2.6 Myr 
ago-present) and the Neogene (23-2.6 Myr ago) periods are shaded in grey 
and light blue, respectively. Mean stem ages for 25 of the lineages occurred 
within the Neogene and for two lineages within the Quaternary. Outgroups for 
each lineage are not included in the depicted phylogeny. 
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drift among the 27 lineages (Extended Data Fig. 2 and Supplementary 
Tables 3-7). Instead of supporting a single event, the genetic data were 
consistent with 9 to 29 divergence events across the Andes, with each 
event occurring at a different time (Bayes factor (Bf)=0 when comparing 
o°/ t<0.01 and o7/t > 0.01; Extended Data Fig. 2 and Supplementary 
Information). The timing (t) of most of these divergence events was in 
the Pleistocene. These results suggest the Andean uplift did not have a 
direct cross-lineage effect on biological diversification via vicariance, but 
rather had an indirect role in divergence by acting as a semi-permeable 
barrier to post-uplift dispersal. We corroborated the above result of 
asynchronous cross-Andes divergences (Bf = 0.13) using hABC ana- 
lyses on multi-locus data sets (that is, > 100 loci) generated from target 
capture and next-generation sequencing on a selected sample of lineages, 
indicating the pattern was robust to possible bias associated with infer- 
ring population history from single-locus data (Extended Data Fig. 3 
and Supplementary Information). The numbers of temporally spaced 
events also did not support synchronous divergence across the Isthmus 
of Panama and the Amazonian rivers (Isthmus: 1-7 divergence events, 
Bf = 0.00; Amazon River: 1-3 divergence events, Bf = 0.01; Negro River: 
8-17 divergence events, Bf = 0.63; Madeira River: 3-8 divergence events, 
Bf = 0.66; Extended Data Fig. 2 and Supplementary Information), a 
pattern consistent with the permeability of these barriers”. 

We next examined to what extent speciation was influenced by the 
histories and ecologies of the 27 lineages. We selected two historical and 
two ecological summary variables previously implicated in avian diver- 
sification: (1) lineage age (a measure of evolutionary persistence), which 
we measured as the timing ofa lineage’s divergence from its sister taxon 
(stem age); (2) ancestral area of a lineage’s origin (east or west of the 
Andes); (3) foraging stratum, a measure of dispersal ability linked to the 
behaviour of birds (canopy, high dispersal ability or understorey, low 
dispersal ability); and (4) niche breadth (an indirect measure of dispersal 
ability based on habitat preference), estimated from climate-based ecol- 
ogical niche models (Supplementary Information). We then used phy- 
logenetic generalized least-squares analyses to test the effects of these 
variables on the number of species within each of the 27 lineages, as defined 
by a coalescent-based Bayesian species-delimitation method (Supplemen- 
tary Information and Extended Data Fig. 4). 

We found that a lineage’s intrinsic ability to persist in the landscape 
was an important driver of speciation. The number of species within a 
lineage was strongly predicted by lineage age (AAICc = 6.9586, where 
AAICc refers to the change in the sample size-corrected Akaike informa- 
tion criterion when a predictor variable was removed from the model 
containing all predictor variables; Fig. 3b, Table 1 and Supplementary 
Tables 12 and 16). This relationship is consistent with the idea that the 
longer a lineage occupies the landscape the more opportunities it has 
to disperse and differentiate across geographical barriers. Although a 
sequence of vicariant events acting on a set of co-distributed lineages 
could produce a similar association between lineage age and species diver- 
sity, most of the species diversity we identified originated during the 
Pleistocene epoch (Fig. 2 and Supplementary Table 17; n = 142; 75% 
of species = 2.6 Myr ago), after the Neogene formation of the landscape 
matrix, but before the Last Glacial Maximum (26,500-19,000 years ago). 
At deeper phylogenetic timescales, a positive association between diver- 
gence levels and lineage age has been used to explain greater species rich- 
ness in areas having had more time to accumulate species’. It remains an 
open question whether the phylogeographic-scale processes we docu- 
mented scale up to shape large-scale biodiversity patterns. To put our 
results into a broader temporal and spatial context would require a com- 
parison of recent diversification events between temperate and tropical 
lineages”. 

Ecologically, we found that foraging stratum had a significant effect 
on species diversity (AAICc = 4.0122; Fig. 3, Table 1 and Supplementary 
Tables 12 and 16), with the more dispersal-limited lineages restricted 
to the forest understorey exhibiting significantly higher species diver- 
sity than the more dispersive canopy lineages. This result corroborates 
previous work that documented the greater dispersal ability of canopy 
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species, presumably due to the physiognomy of the canopy and the patch- 
ier distribution of food resources within it”**. The ability of individuals 
to move through the landscape matrix has long-term consequences for 
the accumulation of diversity within lineages, assuming the lineage per- 
sists over evolutionary timescales. 

Studies of biological diversification have sought a general mechanism 
to explain the origins of the extraordinary diversity in Amazonia**”, with 
most concluding that landscape change by geological, climatic or marine 
forces is the principal driver of speciation. Using a comparative phylo- 
geographic approach and incorporating the variability in ecology and 
evolutionary history among co-distributed lineages, we found that genetic 
patterns in birds are not easily reconcilable with a model in which diver- 
sification is a direct response to landscape change. Instead of finding the 
predicted shared response among lineages, our comparative analysis, 
and phylogeographic studies of other Amazonian organisms”*, found 
extensive spatial and temporal discordance in genetic differentiation to 
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Figure 3 | Asynchronous divergence times across barriers and the influence 
of lineage-specific traits on species diversity. a, The variation in divergence 
times across barriers cannot be attributed to ecologically mediated 
vicariance. There was no significant association between dispersal ability 
and divergence times across the Andes and the Isthmus of Panama. Only part of 
the variance in divergence times across rivers was attributable to dispersal 
ability. Divergence levels across Amazonian rivers were generally shallower 
in canopy birds, but understorey birds diverged multiple times across each 
river. Circles represent mean estimates and bars represent the 95% highest 
posterior density. Colour coding of the points corresponds to the foraging 
stratum of each lineage: understorey, orange; canopy, green. Vertical hashed 
lines at 2.58 million years represent the transition between the Neogene (to the 
right of line) and Quaternary (to the left of line). b, Within-lineage species 
diversity increases with lineage (stem) age. Solid lines represent the fit of the 
data to a model using phylogenetic generalized least-squares analyses. Black 
points and line correspond to mean stem ages, and the purple points and lines 
correspond to the high and low values of the stem age 95% highest posterior 
density. c, Box plot illustrating that species diversity is significantly higher in the 
understorey lineages than in forest canopy lineages. The box plot shows the 
first, second and third quartiles, the lines are the 95% confidence intervals and 
the circles represent outliers. Significant associations in panels a, b and ¢ are 
supported by phylogenetic generalized least-squares analyses shown in Table 1 
and Supplementary Tables 9-15. Statistical tests were performed independently 
on each data set except for divergences across rivers; all rivers were combined 
into a single analysis. 


be the norm. For example, divergence levels across the Andes were con- 
sistent with 9 to 29 distinct divergence events (Extended Data Fig. 2). 
Although highly suggestive of multiple dispersal events, this variation 
could be explained by a single vicariant event associated with the Andean 
uplift if the dispersal restrictions imposed by the barrier were heavily 
dependent on dispersal ability, such as was reported for a taxonomi- 
cally diverse group of marine organisms isolated by the formation of the 
Isthmus of Panama”. Ina similar fashion, the emerging Andes could have 
first become a barrier for bird lineages with low dispersal abilities, with 
fragmentation of the distributions of more dispersive lineages occur- 
ring later. However, we detected no significant associations between 
dispersal abilities and divergence times across the Andes and the Isthmus 
of Panama that would support a model of ecologically mediated vicar- 
iance for these barriers (Fig. 3a and Supplementary Tables 13 and 14). 
For the Amazonian rivers, only part of the variance in divergence levels 
was explained by dispersal ability (Supplementary Table 15) because there 
were multiple independent divergence events within the understorey 
lineages (Fig. 3a and Extended Data Fig. 2). Thus, the wide range of diver- 
gences across rivers cannot be reconciled with a model of ecologically 
mediated vicariance. As the stem ages of 25 of the 27 lineages we exam- 
ined date to the Neogene, we do not reject the possibility that the initial 
geographical isolation of populations at deeper phylogenetic scales was 
due to vicariance associated with the Andean orogeny or with the emer- 
gence of other landscape features. 

The accumulation of bird species in the Neotropical landscape occurred 
through a repeated process of geographical isolation, speciation and expan- 
sion, with the amount of species diversity within lineages influenced by 
how long the lineage has persisted in the landscape and its ability to dis- 
perse through the landscape matrix. A growing body of phylogenetic 


Table 1 | Phylogenetic generalized least-squares regression showing 
the effects of historical and ecological variables on species diversity 


Effect Estimate Standard error t value P AAICc 
Lineage age 0.1187 0.0283 4.1907 0.0004 6.9586 
Foraging 0.5188 0.2025 2.5623 0.0178 4.0122 
stratum 

Ancestral origin —0.1921 0.2023 —0.9495 0.3527 —-1.9546 
Niche breadth 1.0097 1.0658 0.9473 03538 —1.9595 


Output is from the full model and AAICc refers to the change in AlCc when each predictor variable was 
removed from the full model. Species diversity was square root transformed and (stem) lineage age is in 
units of millions of years. Full model AlCc = 43.7365; adjusted R? = 0.567; fiat) = 9.524, 22); P<0.001; 
n= 27 lineages. Model output for foraging stratum and ancestral origin corresponds to the comparison 
of the reference level (foraging stratum, understorey; ancestral origin, east of the Andes) for each 


categorical variable. 
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evidence indicates that average rates of avian diversification have been 
relatively constant in the Neotropics**”’ and, consistent with this, our 
results show that tumultuous changes to the South American landscape 
may not have led to marked pulses in speciation. Correlations between 
lineage ages and the Andean uplift or Quaternary climatic events reported 
elsewhere””’ are suggestive of landscape and environmental change being 
a component of the diversification process, but the details of how, when 
and to what extent these changes drove the origin of standing species- 
level diversity remain unclear. Our phylogeographic-scale analysis indi- 
cated most species-level variation postdates the Andean uplift, and our 
results contribute to a growing number of studies reporting dispersal 
events as the primary initiators of geographical isolation and speciation’. 
Our results also have an important conservation implication. Anthro- 
pogenic alterations of the landscape matrix by deforestation and climate 
change affect not only the evolutionary persistence of rainforest line- 
ages, but also the occurrence of cross-barrier dispersal events within 
lineages that lead to new biological diversity. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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Individual improvements and selective mortality 
shape lifelong migratory performance 
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Billions of organisms, from bacteria to humans, migrate each year’ 
and research on their migration biology is expanding rapidly through 
ever more sophisticated remote sensing technologies” *. However, 
little is known about how migratory performance develops through 
life for any organism. To date, age variation has been almost system- 
atically simplified into a dichotomous comparison between recently 
born juveniles at their first migration versus adults of unknown age” ’. 
These comparisons have regularly highlighted better migratory per- 
formance by adults compared with juveniles’, but it is unknown whether 
such variation is gradual or abrupt and whether it is driven by improve- 
ments within the individual, by selective mortality of poor performers, 
or both. Here we exploit the opportunity offered by long-term mon- 
itoring of individuals through Global Positioning System (GPS) sat- 
ellite tracking to combine within-individual and cross-sectional data 
on 364 migration episodes from 92 individuals ofa raptorial bird, aged 
1-27 years old. We show that the development of migratory behav- 
iour follows a consistent trajectory, more gradual and prolonged than 
previously appreciated, and that this is promoted by both individual 
improvements and selective mortality, mainly operating in early life 
and during the pre-breeding migration. Individuals of different age 
used different travelling tactics and varied in their ability to exploit 
tailwinds or to cope with wind drift. All individuals seemed aligned 
along a race with their contemporary peers, whose outcome was largely 
determined by the ability to depart early, affecting their subsequent 
recruitment, reproduction and survival. Understanding how climate 
change and human action can affect the migration of younger ani- 
mals may be the key to managing and forecasting the declines of many 
threatened migrants. 

The recent development of remote tracking is opening new opportu- 
nities in migration research by enabling the monitoring of a comprehen- 
sive suite of individual-level migration parameters over several years*”. 
These data are ideally suited to examine the ontogeny of migratory abil- 
ities throughout life. However, tracking studies conducted so far have 
looked at individuals of unknown age, or incorporated age as a compar- 
ison between first-year juveniles versus unknown-age adults'?”’. Fur- 
thermore, technological costs and naturally high juvenile mortality have 
typically resulted in small samples of 1-3 juveniles, favouring valuable ana- 
lyses of migratory performance over small temporal scales (hourly to daily), 
but preventing insight into performance over a whole migration episode 
through successive years. Therefore, we are missing a comprehensive pic- 
ture of how migration could change throughout life for any organism. 

Here, we fill this knowledge gap by examining the lifelong migration 
performance of a medium-sized raptor, the black kite (Milvus migrans). 
Kites spend their first 1 or 2 years in Africa and usually start breeding in 
Europe when they are 3-6 years old’*. Breeding performance and sur- 
vival peak between ages 7-11 and decline thereafter’®'*. Populations of 
western Europe breed between March and August and winter in west- 
ern Africa after a narrow-front migration funnelled through the Strait 
of Gibraltar®'*’” (Fig. 1). Kites migrate individually and do not coordi- 
nate their movement consistently with specific individuals, but often 


travel within loose flocks of up to thousands of raptors and storks'*. As 
in other soaring birds, most of the migration is accomplished through 
the exploitation of uplift generated by air convection in thermals, the 
birds gaining height by circling in buoyant air and then gliding to the 
next thermal”. All the individuals tracked in this study belong to the pop- 
ulation of Donana National Park (southwestern Spain), which has been 
subjected to intensive marking since the 1970s”°. This allowed us to equip 
92 individuals of known age with satellite devices, sampling all the ages 
in the population (1-27 years old) (Fig. 2a), and obtaining movement 
data for 364 migration episodes (162 pre-breeding and 202 post-breeding 


Figure 1 | A river of raptors. Migration routes of black kites born in Dofiana 
National Park, southwestern Spain. Pre-breeding tracks are shown in red 
and post-breeding tracks in yellow. Eleven pre-breeding tracks starting from 
further south were shortened for clarity of presentation. 
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journeys). Hereafter, for simplicity, we define 1-2 years olds at their first 
migration as ‘juveniles’ and 3-6 years olds as ‘young adults’. 

In the pre-breeding, return migration (northern spring), kites departed 
from Africa over 5 months (23 January—23 June). This wide range was 
dictated by the sequential departure of different age contingents (Fig. 2a): 
departure date advanced steeply with age up until 7 years old, reaching a 
stable value thereafter. This sequence was consistent with both selective 
mortality of poor performers (see later) and with individual-level improve- 
ments: within each individual, departure improved (that is, it occurred 
earlier) most markedly in younger adults (Fig. 2b, grey bars), while repeat- 
ability was low for juveniles, moderate in young adults and stabilized at 
a high level thereafter (Fig. 2b, black bars). 

Once departed, kites progressed on average by 183 km per day (stop- 
overs included) and 209 km per travelling day (stopovers excluded), paus- 
ing for 3 days at 1.1 stopovers, flying for 8.5 h per day during 18 days over 
a 3,131 km route (Extended Data Table 1). All these components of migra- 
tion varied cross-sectionally with age, even after statistically controlling 
for the environmental conditions encountered en route (see Methods; 
Extended Data Tables 2, 3 and Fig. 3): (1) the speed on travelling days 
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Figure 2 | Migration performance across and within individuals. a, Across 
individuals, pre-breeding departure date improved rapidly during the first 

7 years of life and then reached a plateau. b, In these initial years, surviving birds 
(red line) had earlier departure dates (right axis) than birds that died within the 
next year (blue line). Similarly, within individuals, the repeatability of departure 
(black bars, left axis) was lowest in the initial years of life, when individual 
improvements (grey bars, left axis) were highest, and stabilized after birds were 
7 years old. Therefore, the cross-sectional pattern depicted in a was consistent 
with both within-individual improvements and selective removal of inferior 
performers. Individual improvements were calculated as proportional changes 
from year t to year f + 1 and multiplied by 3 for clarity of presentation. Details 
of within-individual improvements and repeatability analyses are given in 
Extended Data Tables 5, 6 and 8. The fitted line in a is a smoother. Error bars 
represent 1 standard error of the mean (s.e.m.). 
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declined linearly with age; (2) the overall speed was maximum for young 
adults; (3) journey duration replicated the speed patterns; (4) stopovers 
were longest in juveniles and shortest in young adults; (5) the hours of 
flight per day declined with age; (6) route length was minimal for young 
adults; and (7) juveniles migrated more eastward than others. Within 
individuals, all migration parameters depended on the advancement of 
departure date, even while controlling for environmental conditions (see 
Methods). Birds that advanced their departure less from one year to the 
next increased their speed more through fewer stopovers and shorter 
routes, indicating that they travelled ‘in a hurry’ (Extended Data Table 5). 
This suggested the possibility that individuals had a sense of their cur- 
rent performance compared with previous years. 

Thus, different aged birds employed different strategies and coped dif- 
ferently with environmental conditions (see interactions in Extended Data 
Table 2). In particular, 1-2-year olds were potentially capable of travel- 
ling as fast as adults (Fig. 3a), but clearly suffered more from crosswinds 
(Extended Data Table 2), which pushed them eastward (Extended Data 
Fig. 1j) and forced them to pause for up to 15 days (Extended Data Fig. 1g); 
3--6-year olds rushed to the breeding quarters with maximum speed, which 
they attained by flying more hours per day than adults (Extended Data 
Fig. 1h) by skipping stopovers (Extended Data Fig. 1g), thus straight- 
ening and shortening the route (Extended Data Fig. 1i), and by increas- 
ing their speed opportunistically with tailwinds (Extended Data Table 2). 
However, they were still slowed down by crosswinds (Extended Data 
Table 2), suggesting that the capability to cope with drift is a complex 
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Figure 3 | Age-related changes in average speed, duration and timing of 
pre-breeding migrations. a, Speed was maximum for young adults and 
minimum for juveniles. b, Journey duration was longest for juveniles, shortest 
for young adults and intermediate in older kites. c, Arrival date occurred 
progressively earlier with age until seven years old and reached a stable value 
thereafter, replicating the departure pattern of Fig. 2a. The complete set of all 
components is shown in Extended Data Fig. 1. Error bars represent 1 s.e.m. 
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task acquired more gradually and over longer timescales than previously 
demonstrated”. Finally, older kites travelled much earlier, with a strong 
timing advantage, progressed more slowly (Fig. 3a) and increased their 
speed with tailwinds but less than younger birds (Extended Data Table 2), 
as reported for many migrants as a way to conserve energy®”!””. This sug- 
gested that energy minimization may be more important in older birds, 
whereas time minimization may be prevalent in younger individuals if 
they want to acquire a breeding territory (see below). Therefore, ageing 
changed the response to the opportunity offered by tailwinds and to the 
constraints imposed by crosswinds. 

However, age differences in departure dates were so large that differ- 
ential travelling tactics were insufficient to reverse the order of arrival, 
which essentially replicated the departure sequence (Fig. 3c and Sup- 
plementary Videos 1 and 2). Thus, departure date was probably the most 
important factor in terms of overall migration performance. This was 
confirmed by the fact that all tested components of fitness were related 
to departure date and no other aspect of migration. For young adults, 
earlier departure led to a higher probability of recruitment (Extended Data 
Table 7). A 10-day delay in departure caused an 11% decline in recruit- 
ment probability and this figure increased to 36% for a 30-day delay, which 
explained the time-minimization migratory tactic of young birds. Finally, 
within all age groups, earlier departure was associated with improved sur- 
vival, longevity and reproductive performance (Extended Data Table 7) 
and the survival relationship was particularly marked in the first 7 years 
of life (Fig. 2b). 

In the post-breeding migration (northern autumn), kites migrated 
under more favourable conditions (more thermal lift and dominant north- 
easterly trade winds”): advancement was essentially propelled by air con- 
vection and tailwinds and these effects were generally uniform across age 
classes (Extended Data Tables 2 and 4). This favourable aeroscape prob- 
ably allowed first-time migrants (fledglings) to depart synchronously with 
adults (Extended Data Fig. 2a), although none of them migrated with 
their own parents. In contrast, young adults departed later (Extended 
Data Fig. 2a), probably because they were prospecting and establishing 
breeding territories for the next year**. Once departed, kites progressed 
by 257 km per day and 264 km per travelling day, pausing for 0.4 days 
at 0.4 stopovers, flying for 9.5 h per day during 11 days over a 2,784 km 
route (Extended Data Table 1). Again, while controlling for the condi- 
tions encountered en route, all migration components varied with age 
but some patterns were different from spring: speed increased with age, 
stopovers were longer in juveniles, and both young adults and juveniles 
were more deviated by crosswinds than older kites (Fig. 4, Extended Data 
Fig. 2 and Extended Data Tables 2 and 4). In general, repeatability was mod- 
erate to high only for departure and arrival dates (Extended Data Table 6), 
within-individual improvements were mostly related to changes in envi- 
ronmental conditions (Extended Data Table 8), and none of the migra- 
tion components led to higher fitness. Thus, autumn traits were more 
flexible, less tied to a tight schedule and probably shaped by life stage- 
specific tasks, such as prospecting by young adults. 

In conclusion, environmental forcings, the ability to cope with them, 
performance perception and differential life history tasks all interacted 
to generate a complex but predictable ontogeny of migratory performance 
mediated by within-individual improvements and selective mortality, 
operating most strongly in early life and during the pre-breeding migra- 
tion, that is, in the life stages and seasons that are least sampled by track- 
ing studies. Ageing was accompanied by gradual, rather than abrupt, 
improvements, continuing for up to 7 years and not always in the expected 
direction (for example, lower speed by older birds in spring). Indepen- 
dently of age, all individuals seemed aligned along a race to arrive early, 
especially to the breeding quarters, and its outcome was essentially deter- 
mined by the capability for early departure rather than fast travelling. 
Furthermore, the neat division into a temporal, age-structured sequence 
of travelling birds implied that each individual mainly raced against its 
contemporary peers (Supplementary Videos 1 and 2). Thus, some indi- 
viduals managed to improve their ability to cope with environmental 
conditions through their early life; they departed progressively earlier 
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Figure 4 | Age-related changes in average speed, duration and timing of 
post-breeding migrations. a, Speed was maximum for older individuals, 
minimum for juveniles and intermediate for young adults. b, Journey duration 
became progressively shorter with age. c, Arrival date was earliest for older birds 
and latest for young adults. The complete set of all components is shown in 
Extended Data Fig. 2. Error bars represent 1 s.e.m. 


and attained high breeding and survival rates and thus higher longevity. 
In contrast, those that did not manage to improve their migratory per- 
formance did not recruit and progressively disappeared, implying that 
selection directly operated on the capability for individual improvement. 
The fact that such age variation was observed in a population travelling 
semi-socially from the extreme south of Europe suggests that the observed 
patterns could be even more extreme for individuals facing more dif- 
ficult conditions (for example, longer journeys, over larger stretches of 
water, incorporating harsher weather, or travelling solitarily). Finally, 
given that selection was stronger in those age classes least capable to 
cope with adverse environmental conditions, understanding how climate 
change and human action could affect the migration of younger ani- 
mals may be the key to forecasting future impacts on many threatened 
migrants**’°. For migratory animals, travelling strategies are inextrica- 
bly tied to life history strategies, providing a tight link between move- 
ment tactics, life history decisions and demographic performance. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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Dysregulated neurodevelopment with altered structural and func- 
tional connectivity is believed to underlie many neuropsychiatric 
disorders’, and ‘a disease of synapses’ is the major hypothesis for the 
biological basis of schizophrenia’. Although this hypothesis has gained 
indirect support from human post-mortem brain analyses” and ge- 
netic studies*"°, little is known about the pathophysiology of syn- 
apses in patient neurons and how susceptibility genes for mental 
disorders could lead to synaptic deficits in humans. Genetics of most 
psychiatric disorders are extremely complex due to multiple suscep- 
tibility variants with low penetrance and variable phenotypes'’. Rare, 
multiply affected, large families in which a single genetic locus is 
probably responsible for conferring susceptibility have proven in- 
valuable for the study of complex disorders. Here we generated induced 
pluripotent stem (iPS) cells from four members of a family in which 
a frameshift mutation of disrupted in schizophrenia 1 (DISC1) co- 
segregated with major psychiatric disorders’? and we further pro- 
duced different isogenic iPS cell lines via gene editing. We showed 
that mutant DISC1 causes synaptic vesicle release deficits in iPS- 
cell-derived forebrain neurons. Mutant DISC1 depletes wild-type 
DISC] protein and, furthermore, dysregulates expression of many 
genes related to synapses and psychiatric disorders in human fore- 
brain neurons. Our study reveals that a psychiatric disorder relevant 
mutation causes synapse deficits and transcriptional dysregulation 
in human neurons and our findings provide new insight into the 
molecular and synaptic etiopathology of psychiatric disorders. 

DISC] was originally identified at the breakpoint ofa balanced chro- 
mosomal translocation that co-segregated with schizophrenia, bipolar 
disorder and recurrent major depression in a large Scottish family”. 
Another rare mutation of a 4 base-pair (bp) frameshift deletion at the 
DISC1 carboxy (C) terminus was later discovered in a smaller Ameri- 
can family (pedigree H), which shares many similarities with the Scot- 
tish pedigree’. DISC] variants and polymorphisms have since been found 
to be associated with schizophrenia, bipolar disorder, major depression, 
and autism, and animal studies support a potential contribution of DISC1 
to the etiopathology of major mental disorders”, including regulating 
neuronal development and synapse formation™. Little is known about 
DISC] function or dysfunction in human neurons. 

Pluripotent stem cells reprogrammed from patient somatic cells 
offer a new way to investigate mechanisms underlying complex human 
diseases’’. Using an episomal non-integrating approach”* we establish 
iPS cell lines from pedigree H”, including two patients with the frame- 
shift DISC1 mutation (D2 (schizophrenia) and D3 (major depression)) 


and two unaffected members without the mutation (C2 and C3; Fig. 1a). 
Wealso included an unrelated healthy individual as an additional con- 
trol (C1). We performed extensive quality control analyses and selected 
two iPS cell lines (indicated by 1 or 2, for example, C1-1 and C1-2) from 
each individual for detailed studies (Extended Data Fig. 1 and Supplemen- 
tary Table 1a). 

We differentiated iPS cells into forebrain-specific human neural pro- 
genitor cells (hNPCs) expressing nestin, PAX6, EMX1, FOXG1 and OTX2 
(Fig. 1b; Extended Data Fig. 2a, b and Supplementary Table 1b), and 
then into MAP2AB* neurons (99.92 + 0.08%; n = 5). About 90% of neu- 
rons expressed VGLUT1 or %-CAMKIL, indicative of glutamatergic neu- 
rons, whereas few neurons expressed VGAT (also known as SLC32A1) 
or GAD67 (GABAergic), and even fewer expressed tyrosine hydroxy- 
lase (TH) marker (dopaminergic; Fig. 1c and Extended Data Fig. 3). These 
neurons express different cortical layer markers, including TBR1, CTIP2 
(also known as BCL11B), BRN2 (also known as POU3EF2) and SATB2 
(Fig. 1d). Quantitative analyses showed no differences in neuronal sub- 
type differentiation among all lines (Fig. 1c, dand Extended Data Fig. 3). 

The mutant DISC] allele is predicted to generate a frameshift mu- 
tant DISC1 protein (mDISC1) with 9 de novo amino acids at the C 
terminus'* (Extended Data Fig. 4a). Quantitative real-time PCR (qRT- 
PCR) analysis ofa common exon 2 showed similar messenger RNA levels 
in different neurons (Extended Data Fig. 4b and Supplementary Table 1c). 
Strikingly, D2 and D3 neurons only expressed ~ 20% of the total DISC1 
protein detected in control neurons using antibodies” that recognized 
both human full-length wild-type DISC1 (wDISC1) and mDISC1 when 
expressed in HEK293 cells (Fig. le). DISC1 interacts with itself and 
forms multimers, and sometimes aggregates’*. Given that patients are 
heterozygous for the DISC1 mutation (Extended Data Fig. 1), this re- 
sult suggested a model in which mDISC1 interacts with wDISC1 to form 
aggregates and deplete soluble DISC1. Indeed, differentially tagged 
wDISC1 and mDISC1 co-immunoprecipitated when co-expressed in 
HEK293 cells (Extended Data Fig. 4c). mDISC]1 significantly decreased 
soluble wDISC1 proteins in a dose-dependent manner and, furthermore, 
increased wDISC1 ubiquitination (Extended Data Fig. 4d, e). These re- 
sults suggest a mechanism distinct from DISC1 haploinsufficiency in 
mutant human neurons. 

We next examined human forebrain neuron development. As in ani- 
mal models“, quantitative analyses showed that mutant neurons ex- 
hibited increased soma size and total dendritic length at 1 and 2 weeks 
after neuronal differentiation; however, these properties became indis- 
tinguishable from control neurons at 3 and 4 weeks (Extended Data 
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Figure 1 | Normal neural differentiation, but markedly reduced total 
DISC1 protein levels in forebrain neurons derived from patient iPS cells 
carrying the DISC1 mutation. a, A schematic diagram of the pedigree for iPS 
cell generation. In addition, iPS cells from a control individual outside of the 
pedigree (C1, male) were used in the current study. The symbol + indicates 
one copy of the 4-bp deletion in the DISC1 gene; the symbol — indicates lack of 
the 4-bp deletion in the DISC1 gene. b-d, Neural differentiation of iPS 

cells. b, Sample bright-field and confocal images of nestin and PAX6 
immunostaining of hNPCs. See Extended Data Fig. 2 for characterization of 
additional forebrain neural progenitor markers. c, Sample confocal images of 
immunostaining of human neurons at 4 weeks after neuronal differentiation 
for VGLUTI1 (also known as SLC17A7) and VGAT, and quantification of 
VGLUT1* neurons among different iPS cell lines. Values represent 

mean + s.e.m. n = 5 cultures. See Extended Data Fig. 3 for characterization of 
other markers. d, Sample confocal images of immunostaining for MAP2AB 
and neuronal subtype markers of different cortical layers, and quantification of 
neuronal subtype differentiation among different iPS cell lines. Values 
represent mean = s.e.m. n = 4 cultures. Scale bars, 20 jim. e, DISC1 protein 
levels in forebrain neurons derived from different iPS cell lines. Shown are 
sample western blot images and quantification. Data were normalized to actin 
for sample loading and then normalized to C2-1 in the same blot for 
comparison. Values represent mean + s.e.m. n = 3; ANOVA test. Note that 
the DISC1 antibodies used recognized both full-length human wDISC1 
(HA-tagged) and mDISC1 (Flag-tagged) exogenously expressed in 

HEK293 cells. 
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Figure 2 | Defects of glutamatergic synapses in forebrain neurons carrying 
the DISC1 mutation. a, b, Decreased density of sv2t puncta by human 
forebrain neurons derived from patient iPS cell lines carrying the DISC1 
mutation compared to control lines. a, Sample confocal images of SV2 and 
DCX immunostaining of neurons at 6 weeks after neuronal differentiation. 
Scale bar, 20 um. b, Summaries of quantification of sv2> puncta density for 
neurons derived from two iPS cell lines for each individual. Values represent 
mean + s.e.m. n = 5 cultures; ANOVA test. c, d, Defects in glutamatergic 
synaptic transmission by DISC1 mutant neurons. Forebrain hNPCs were co- 
cultured on confluent astrocyte feeder layers. c, Sample phase images of co- 
culture and sample whole-cell voltage-clamp recording traces of excitatory 
spontaneous synaptic currents (SSCs). Scale bar, 20 jum. d, Distribution plots of 
SSC event intervals and amplitudes. n = 10-12 neurons for each condition; 
Kolmogorov-Smirnov test. Mean frequencies and amplitudes are also shown. 
e, Decreased vesicle release by DISC1 mutant neurons. Six-week-old neurons 
were imaged for KCl (60 mM) induced release of FM1-43. Values represent 
mean + s.e.m. n = 4 cultures; ANOVA test. 


Fig. 5). Electrophysiological recordings of neurons did not show any 
consistent changes in their current-voltage (I-V) relationship at 4 weeks 
after differentiation (Extended Data Fig. 6). To examine synapse forma- 
tion, we immunostained synaptic vesicle protein SV2 (Fig. 2a), which 
is associated with mature synaptic vesicles and regulates presynaptic 
release’”°. The density of SV2” synaptic boutons was significantly re- 
duced in D2 and D3 neurons compared to control neurons at both 4 
and 6 weeks (Fig. 2b). We next performed whole-cell patch-clamp record- 
ings of human neurons of similar densities co-cultured on astrocytes”! 
(Fig. 2c). The frequency of excitatory spontaneous synaptic currents 
(SSCs), but not the amplitude, was significantly lower for D2-1 and 
D3-1 neurons compared to those of C3-1 neurons at both 4 and 6 weeks 
(Fig. 2d), suggesting a presynaptic defect in synaptic release. Results 
appeared to be more complex when neurons derived from outside of 
the pedigree (C1) were compared. D2-1 neurons exhibited markedly 
reduced SSC frequency and amplitude compared to C1-1 neurons at 
4 weeks and slightly reduced frequency and amplitude at 6 weeks (Fig. 2d). 
For D3-1 neurons, similar results of reduced SSC frequency, but not 
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amplitude, were observed when compared to C1-1 or C3-1 neurons 
at 4 or 6 weeks (Fig. 2d). Although uniform results were obtained from 
comparison of neurons derived from the same family, all electrophys- 
iological data showed functional synaptic transmission deficits in DISC1 
mutant neurons and further suggested a component of presynaptic dys- 
function. Indeed, quantitative FM1-43 imaging analyses revealed a sig- 
nificant defect in depolarization-induced vesicle release for mutant 
neurons compared to control neurons (Fig. 2e). 

To address whether the DISC1 mutation is necessary and/or sufficient 
for observed synaptic defects, we generated different types of isogenic 
iPS cell lines using transcription activator-like effector nuclease (TALEN; 
Fig. 3a). First, we corrected the 4-bp deletion in one mutant DISC1 iPS 
cell line (D3-2-6R). Second, we introduced the 4-bp deletion into two 
control iPS cell lines, one within the pedigree (C3-1-3M) and, impor- 
tantly, one outside of the pedigree (C1-2-5M) to control for potential 
effects of family genetic background. We confirmed successful gene 
editing by Sanger sequencing and validated the quality of targeted iPS 
cells (Extended Data Fig. 7). As expected, DISC1 protein expression 
was rescued in D3-2-6R neurons to a level comparable with control 
neurons, and reduced in C1-2-5M and C3-1-3M neurons toa level sim- 
ilar to DISC1 mutant neurons (Fig. 3b). 

We next compared forebrain neurons derived from isogenic and par- 
ental iPS cell lines in parallel. Deficits in the density of SV2* synaptic 
boutons were rescued in D3-2-6R neurons and recapitulated in C1-2- 
5M and C3-1-3M neurons (Fig. 3c). To examine morphological synapses 
further, we co-immunostained neurons with presynaptic marker synap- 
sin 1 (SYN1) and postsynaptic marker PSD95 (also known as DLG4) 
(Fig. 3d). Quantification using the SYN1/PSD95 pair as a synapse mar- 
ker showed reduced density in an mDISC1-dependent fashion (Fig. 3d). 
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Figure 3 | A causal role of the DISC1 mutation in 
regulating synapse formation in human 
forebrain neurons. a, Generation of two types of 
isogenic iPS cell lines. Shown on the left is a 
schematic illustration of the gene editing strategy 
for correction of the mutation (4-bp deletion; red 
bar) ina mutant iPS cell line and for knock-in of the 
same mutation into two control iPS cell lines. HA, 
homology arm. Shown on the right are sample 
images of iPS cell colonies for the correction line 
(D3-2-6R) and the knock-in line (C3-1-3M) and 
confirmation by Sanger sequencing. Scale bar, 

50 tm. b, Expression of DISC] protein in forebrain 
neurons derived from different isogenic iPS cell 
lines. Shown are sample western blot images and 
quantification of the total DISC1 protein level. 
Data were normalized to actin for sample loading 
and then to C2-1 in the same blot for comparison. 
Values represent mean + s.e.m. n = 3; ANOVA 
test. c-f, mDISC1-dependent regulation of 
synaptic puncta density and vesicle release. 

d, Sample confocal images of SYN1 and PSD95 
immunostaining. Scale bar, 20 um. Also shown are 
summaries of densities of SV2* puncta (c) or 
SYN1~ and PSD95* pair (d) of 6-week-old 
neurons. Values represent mean + s.e.m. n = 4 
cultures; ANOVA test. e, Summaries of SSC 
frequencies and amplitudes. Values represent 
mean + s.e.m. nm = 10-16 neurons for each 
condition; Kolmogorov-Smirnov test. f, Summary 
of FM1-43 imaging analysis, similar to analysis in 
Fig. 2e. Values represent mean + s.e.m. n = 4 
cultures; ANOVA test. 


Functional electrophysiological recording and FM1-43-imaging ana- 
lyses also confirmed mDISC1-dependent presynaptic release defects 
(Fig. 3e, f). These results, from three different isogenic iPS cell lines, 
including the knock-in line from outside of the pedigree, establish a 
causal role for the DISC] mutation in synaptic defects of human neu- 
rons and suggest the pathogenic nature of this DISC1 mutation at the 
cellular level. 

To gain molecular insight into how this pathogenic DISC] mutation 
causes synaptic defects, we performed RNA-seq analysis of 4-week-old 
forebrain neurons derived from a control (C3-1) and two mutant (D2- 
1 and D3-2) iPS cell lines in triplicate (Supplementary Table 2a). There 
were a large number of differentially expressed genes between C3-1 and 
D2-1/D3-2 neurons (false discovery rate < 5%; Fig. 4a and Supplemen- 
tary Table 2b, c), while the expression profiles of D2-1 and D3-2 were 
very similar (Extended Data Fig. 8a). Results from qRT-PCR analyses 
of selected genes using independent samples of C3-1 and D2-1 neurons 
were consistent with the RNA-seq data (Extended Data Fig. 8b). Detailed 
bioinformatic analyses revealed several striking features of differentially 
expressed genes. First, the top three significantly enriched categories from 
GO analysis were ‘synaptic transmission’, ‘nervous system development’ 
and ‘dendritic spine’ (Fig. 4a and Supplementary Table 2d). Second, a 
large number of genes encoding DISC1-interacting proteins” were dif- 
ferentially expressed (Fig. 4b). This result is surprising because previous 
studies have not identified the transcriptional relationship between 
DISC1 and its protein-interacting partners. Third, 89 differentially 
expressed genes are linked to schizophrenia, bipolar disorder, depres- 
sion and mental disorders (Fig. 4c and Supplementary Table 2e). Thus, 
mDISC1 also functions as a hub for transcriptional regulation of genes 
implicated in psychiatric disorders. 


©2014 Macmillan Publishers Limited. All rights reserved 


LETTER 


b c 
1,000 Downregulated Upregulated 
800 | 2,124 transcripts 1,573 transcripts Major mental disorders 
> 1,132 genes 877 genes 
5 600 
3 
& 400 
ental ilInesse. ola disor, 
200 
0 
-10 -5 e) 5 
log,FC (D2-1;D3-2/C3-1) 
Mental disorders (PA447208) 
Schizophrenia (PA447216) 
Synaptic transmission (GO: 0007268) 
Nervous system development (GO: 0007399) 
Anxiety disorders (PA447196) 
Bipolar disorders (PA447199) 
Dendritic spine (GO: 0043197) 
Voltage-gated cation channel (GO: 0022843) 
Cation channel activity (GO: 0005261) 
Synaptic membrane (GO: 0097060) 
Postsynaptic density (GO: 014069) 
0 5 10 15 20 25 oes 
—log,,. Hypergeometric P value 0g, 
d Presynaptic Postsynaptic Transporter e 
OOOOO oO 
Bo a 7 oes 
l_ i] KEK SC OCN crel 1 
ooo oO SYN| = quae meee) = c310O0O000 O 
Ao Oo o-1GEBOO ff 
SYP —-- -_—— oO 
Off o 032 1BOOO f 
Boo =] GiuR1 | — — oe ee ee | C1-25M OOOO @ 
[ =] c3-1-3M LI oH 
NR1 | se se ee se ee D3-2-6R [ ] 
oo o oo HO000 Oo 
ooo HOOO0 Merc se «4 ESSEE S 
MVrMOR FHFY¥N REMOH O 5 ae ee | 
YSEXS TSX GTssxsg i ACTIN | > ce ee <  se eee oO = 
E5256 S55 GEsoso ¥ 
: BBA nc) 
5 0 5 
logsFC 


Figure 4 | Dysregulation of neuronal transcriptome encoding a subset of 
presynaptic proteins, DISC1-interacting proteins and mental-disorder- 
associated proteins in human forebrain neurons carrying the DISC1 
mutation. a-c, Summary of RNA-seq analysis of 4-week-old forebrain 
neurons derived from C3-1, D2-1 and D3-2 iPS cells, n = 3 samples for each iPS 
cell line. a, Histographs of differentially expressed genes in DISC1 mutant 
neurons (both D2 and D3) compared to control neurons and GO analysis. 

b, Illustration of differentially expressed genes encoding DISC1-interaction 
proteins. Heat-map indicates mean values of differential expression for each 
gene. ¢, Illustration of differentially expressed genes that are related to mental 


Toextend these results and establish a causal link between differential 
gene expression and the DISC] mutation, we performed qRT-PCR ana- 
lyses of synapse-related genes using forebrain neurons derived from 
multiple isogenic iPS cell lines. Differential expression of many genes 
was found to be mDISC1-dependent (Fig. 4d and Extended Data Fig. 8c). 
Consistent with a presynaptic defect, mRNAs for a number of presyn- 
aptic proteins, including SYN isoforms 2 and 3, synaptophysin (SYP), 
synaptoporin (SYNPR), neurexin 1 (NRXN1), and VAMP2, were in- 
creased in neurons carrying the DISC] mutation (Fig. 4d and Extended 
Data Fig. 8c). Western blot analyses further confirmed increased pro- 
tein expression of SYN and SYP in mutant neurons (Fig. 4e and Ex- 
tended Data Fig. 8d). Previous studies in multiple neuronal systems have 
shown that elevated synapsin levels suppress presynaptic neurotrans- 
mitter release***. In contrast, some postsynaptically localized proteins, 
including GLURI (also known as GRIA1) and NRI (also known as GRIN1), 
were not affected at mRNA and protein levels in bulk preparations 
(Fig. 4d, e and Extended Data Fig. 8c, d). We also observed differential 
expression of several transporters (Fig. 4d). Notably, the transcription 
factor MEF2C was drastically increased in mRNA and protein levels 
in mutant neurons (Fig. 4d, e and Extended Data Fig. 8c, d). MEF2C 
functions to restrict glutamatergic synapse numbers” and elevated MEF2C 
decreases frequency, but not amplitude of SSCs in mice”®, which resem- 
bles what we observed in DISC1 mutant human neurons and suggests 
an underlying molecular mechanism. 

Our findings from studying human forebrain neurons derived from 
a collection of patient iPS cells and different isogenic lines suggests a 
model in which susceptibility genes for major psychiatric disorders 


disorders. See Supplementary Table 2e for the gene list. d, Validation of 
differential mRNA expression of selected genes related to synapses in forebrain 
neurons from different isogenic iPS cell lines. Shown is a heat-map of mean 
values of each gene under different conditions, n = 3 experiments. Values were 
normalized to those of C3-1 neurons. See Extended Data Fig. 8c for details. 

e, Validation of differential protein expression of selected genes in forebrain 
neurons from isogenic iPS cell lines. Shown is a heat-map of mean values of 
each protein under different conditions, n = 3 experiments. See Extended Data 
Fig. 8d for details. 


could affect synaptic function via large-scale transcriptional dysregu- 
lation in human neurons. Our results illustrate a potential mechanistic 
link in human patient neurons for three major hypotheses of complex 
psychiatric disorders—genetic risk, aberrant neurodevelopment, and syn- 
aptic dysfunction. We have developed an enhanced iPS cell model for 
schizophrenia and major mental disorders at the cellular level’’ that 
includes a high-penetrance and disease-related genotype, iPS cell lines 
from multiple members of the same family, different types of isogenic 
lines to address causality, and a relatively homogeneous neuronal sub- 
type population. A key challenge and opportunity for iPS cell disease 
modelling is to generate new insight into pathophysiology, as opposed 
to confirming existing hypotheses or validating previous results from 
animal models. Much of our knowledge of DISC1 functions has come 
from understanding the biology of DISC1-interacting proteins and the 
function of these protein complexes, derived mostly from rodent mod- 
els based on overexpression of truncated DISC1 proteins, or loss-of- 
function via genetic deletion or short hairpin RNA (shRNA) knockdown”. 
Unexpectedly, we found that disease-relevant, endogenous mutant DISC1 
in human neurons causes a large-scale transcriptional dysregulation of 
genes associated with synapses, DISC1-interacting proteins, and psy- 
chiatric disorders. Our DISC1 mutant phenotypes partially overlap with 
those observed in previous studies of neurons derived from idiopathic 
schizophrenia patient iPS cells**-*°, including decreased synaptic con- 
nectivity and transcriptional dysregulation of certain genes, suggest- 
ing the potential for a common disease mechanism. Our collection of 
isogenic iPS cell lines and robust cellular phenotypes also provide a plat- 
form for mechanism-guided exploration of therapeutic compounds in 
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correcting synaptic defects of human neurons and for nonbiased large- 
scale screens. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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Tissue-specific clocks in Arabidopsis show 


asymmetric coupling 


Motomu Endo”, Hanako Shimizu’, Maria A. Nohales®, Takashi Araki! & Steve A. Kay? 


Many organisms rely on a circadian clock system to adapt to daily 
and seasonal environmental changes. The mammalian circadian 
clock consists of a central clock in the suprachiasmatic nucleus that 
has tightly coupled neurons and synchronizes other clocks in peri- 
pheral tissues’”. Plants also have a circadian clock, but plant circadian 
clock function has long been assumed to be uncoupled’. Only a few 
studies have been able to show weak, local coupling among cells*”. 
Here, by implementing two novel techniques, we have performed a 
comprehensive tissue-specific analysis of leaf tissues, and show that 
the vasculature and mesophyll clocks asymmetrically regulate each 
other in Arabidopsis. The circadian clock in the vasculature has char- 
acteristics distinct from other tissues, cycles robustly without envir- 
onmental cues, and affects circadian clock regulation in other tissues. 
Furthermore, we found that vasculature-enriched genes that are rhy- 
thmically expressed are preferentially expressed in the evening, whereas 
rhythmic mesophyll-enriched genes tend to be expressed in the morn- 
ing. Our results set the stage for a deeper understanding of how the 
vasculature circadian clock in plants regulates key physiological res- 
ponses such as flowering time. 

To expedite tissue-specific analysis, we developed a technique to isol- 
ate three tissues of leaves with high spatiotemporal resolution. We based 
our strategy on a previously reported technique for mesophyll and vas- 
culature isolation®. After optimizing the buffer and the isolation tech- 
nique we were able to isolate all three major leaf tissues—mesophyll, 
vasculature and epidermis—within 30 min (Fig. la and Extended Data 
Fig. 1a, b). Isolated tissues appeared to be highly purified when observed 
under the microscope (Fig. 1a). 

As different types of tissues have different gene expression profiles, 
we applied Vandesompele’s method to identify appropriate reference 
gene sets’. Among our 10 candidates, ASPARTIC PROTEINASE A1 
(APA1) and ISOPENTENYL PYROPHOSPHATE:DIMETHYLALLYL 
PYROPHOSPHATE ISOMERASE 2 (IPP2) showed lower gene-stability 
values (M), suggesting stable expression in all tissues and time points 
(Extended Data Fig. 1c). We therefore used the geometric mean of APA1 
and IPP2as an internal control in our quantitative real-time-PCR (qPCR) 
analysis. 

The purity of the isolated tissues was confirmed by detecting the expres- 
sion of the tissue-specific markers LIGHT-HARVESTING CHLOROPHYLL 
B-BINDING 2.1 (LHCB2.1)"°, SULPHATE TRANSPORTER 2:1 (SULTR2:1)"! 
and GC1™ by qPCR over 24h (Fig. 1b). In addition, the three primary 
vascular sub-tissues were identified by marker-gene expression ana- 
lysis’’, suggesting that the isolated vasculature is intact (Extended Data 
Fig. 1d). The purity of vasculature was more than 90%, and that of meso- 
phyll and epidermis was more than 80% (Fig. 1c), indicating that the 
results from isolated tissues predominantly reflect the dynamics of the 
respective specialized cells therein. About 77% of total leaf mRNA was 
derived from mesophyll cells, whereas only about 8% and 15% of mRNA 
was derived from vasculature and epidermis, respectively (Fig. 1d and 
Extended Data Fig. le), suggesting that previous results of circadian 
clock studies that were primarily using whole leaves or whole plants as 


the RNA source mostly reflected circadian rhythms in mesophyll cells, 
and gene expression dynamics in minor tissues such as vasculature or 
epidermis were largely overlooked. 

We next examined the expression of TIMING OF CAB EXPRESSION 
1 (TOC1) and CIRCADIAN CLOCK ASSOCIATED 1 (CCA1), and that 
of stress-induced genes under long-day conditions. In all three isolated 
tissues, 24-h oscillations of TOC1 and CCA1 expression were detected, 
and these were consistent with the whole leaf, indicating that the isola- 
tion process did not affect the rhythms of clock genes (Extended Data 
Fig. 1f). Also, no significant induction of stress-induced gene express- 
ion was observed (Extended Data Fig. 1g). 

By applying the direct tissue isolation technique, we investigated tissue- 
specific regulation of the Arabidopsis clock system. Wild-type plants were 
grown under long-day and short-day conditions, and whole leaves, me- 
sophyll and vasculature from cotyledons were collected every 4h over 
2 days. We then performed a time-course microarray analysis, and detected 
cycling genes and their diel phases, using the HAYSTACK“ algorithm 
with a <3% false discovery rate (FDR) (Extended Data Fig. 2 and Sup- 
plementary Table 1). About 50% of the genes in the microarray were 
identified as cycling genes in each condition, and 96.3% of the genes in 
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Figure 1 | Direct tissue isolation from cotyledons. a, Schematic drawings of 
the tissue-isolation strategy and isolated mesophyll (left), vasculature (middle) 
and epidermis (right) visualized by dark-field microscopy. See Methods for the 
detailed protocol. Scale bars are 250 jim. b, Expression analysis of LHCB2.1, 
SULTR2;1 and GC1 as mesophyll, vasculature and epidermis markers, in the 
isolated tissues from 10-day-old seedlings grown under long-day conditions. 
ZT, zeitgeber time. The figure shows representative qPCR results from the three 
independent biological repeats. c, d, Purities of the isolated tissues (c) and 
contribution ratios of each of them to whole-leaf mRNA (d) are estimated using 
the data in Fig. 1b. See Methods for details. Values are mean + s.e.m.; n = 14. 
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the microarray were identified as cycling genes under at least one con- 
dition tested, whereas only 10.5% of the genes in the microarray were 
oscillating together, suggesting tissue-specific and day-length-specific 
diel regulation (Extended Data Fig. 3a—c). We also detected 49 genes as 
new candidates for reference genes that do not cycle across any con- 
dition (Supplementary Table 2). The percentage of wave-shape model 
usage and that of cycling transcripts with specific amplitude were com- 
parable among tissue and conditions (Extended Data Fig. 3d, e). 

We first confirmed that known tissue-specific marker genes were cor- 
rectly identified as such in our microarray analysis (Extended Data Fig. 
4a, b and Supplementary Tables 3 and 4), and validated the geometric 
mean of APA1 and IPP2 as an appropriate reference for tissue-specific 
clock analyses (Extended Data Figs 1c and 4c). In conclusion, we con- 
firmed sufficient sensitivity and specificity in the microarray analysis, 
and defined twofold changes that are significant differences. 

We next observed global gene expression profiles in each tissue (Fig. 2a 
and Extended Data Fig. 5a, b). Highly expressed genes in vasculature at 
ZT 16 (blue-coloured genes) showed low expression levels in mesophyll, 
whereas genes that had lower expression in vasculature (green-coloured 
genes) showed higher expression levels in mesophyll. In whole leaves, 
the gene expression profile was pro-mesophyllic, consistent with our 
previous result that estimated about 80% of RNA in whole leaves came 
from mesophyll cells (Fig. 1d). Thus, we note that vasculature has inverse 
gene expression profiles compared to whole leaf and mesophyll. 

The current circadian clock model consists of multiple interlocking 
loops’*"*. The morning loop consists of morning-expressed PSEUDO- 
RESPONSE REGULATOR genes (PRR), LATE ELONGATED HYPOCOTYL 
(LHY) and CCA1, and the evening loop consists of evening-expressed 
EARLY FLOWERING genes (ELF), LUX ARRHYTHMO (LUX) and 
TOC1. The core loop links these two loops. By comparing the arithmetic 
mean expression levels in the vasculature with those in whole leaves, we 
were able to define vasculature-rich genes and mesophyll-rich genes. We 
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Figure 2 | Vasculature and mesophyll have different gene expression 
profiles. a, Relative gene expression levels in whole leaf, mesophyll and 
vasculature under long-day conditions (LD). Blue- and green-coloured genes 
indicate higher and lower expression than average in the vasculature at ZT16, 
respectively. As an example, the red line highlights the ELF4 expression profile. 
b, Colour-coded expression level representation of the clock genes in the 
circadian clock model. Mesophyll- and vasculature-rich genes are defined 
based on arithmetic mean expression levels and frequencies. See Methods for 
the detailed definition. SD, short-day conditions; V, vasculature; W, whole leaf. 
c, Z-score profiles of mesophyll-rich genes (upper panel) and vasculature-rich 
genes (lower panel) across the entire day. Dotted horizontal lines indicate 
thresholds (FDR <3%). See Methods for details. 
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found that the morning loop consists of mesophyll-rich genes, whereas 
the evening loop consists of vasculature-rich genes (Fig. 2b). ELF4 express- 
ion is about tenfold higher in vasculature, suggesting that the functional 
ELF3, ELF4 and LUX tripartite evening complex’”"’ resides primarily 
in vasculature, even though ELF3 has rather mesophyll-rich expression. 
Consistent with this result, Z-score profiles of mesophyll-rich genes 
(twofold higher in whole leaf compared to vasculature) showed higher 
scores in the morning, indicating that mesophyll-rich genes tend to be 
expressed in the morning (Fig. 2c). Moreover, vasculature-rich genes 
(twofold higher in vasculature compared to whole leaf) tend to be 
expressed in the evening of the corresponding day length (Fig. 2c). 
Notably, significantly enriched gene ontology slim terms were com- 
prehensively different between mesophyll-rich and vasculature-rich 
genes, suggesting that the vasculature and mesophyll clocks have dif- 
ferent functions (Extended Data Table 1). 

To ascertain whether different tissues have different phases, we exam- 
ined PRR7, TOC1 and ELF4 as representative clock genes. Although 
the diel phases of these genes in the isolated tissues were not significantly 
shifted (Extended Data Fig. 5c), this was not the trend when comparing 
all cycling genes. Even accounting for phase randomization by noise, 
the ratio of phase-locked genes (+2 h) was reduced in vasculature versus 
whole leaf and mesophyll versus vasculature, compared to whole leaf 
versus mesophyll, indicating that vasculature and mesophyll have rela- 
tively distinct global phases (Extended Data Fig. 5d, e). We then exam- 
ined if the vasculature clock has characteristic regulatory targets. The 
P value of each cycling gene was ranked from the largest to the smallest, 
and the percentage of overlapping genes (POG) was used to assess the 
percentage of genes that were shared as common targets of the clock in 
a specific tissue. Higher POGs were observed in whole leaf versus meso- 
phyll, and lower POGs were observed in vasculature versus whole leaf 
and mesophyll versus vasculature (Extended Data Fig. 5f), indicating 
that the vasculature clock has relatively distinct, characteristic regulatory 
targets. Consistent with this notion, we identified two novel vasculature- 
specific elements that we named long-day vasculature element (LVE, 
ACACGG) and short-day vasculature element (SVE, GCGGGA), both 
of which showed a higher Z-score in vasculature but not in whole leaves 
and mesophyll (Extended Data Fig. 6). We also found that known ele- 
ments such as the telo-box, starch box and protein box”’ were rather 
mesophyll-enriched elements (Extended Data Fig. 6). 

To support the results obtained from isolated tissues with a non- 
invasive observation of promoter activity, we next developed a tissue- 
specific luciferase assay (TSLA) for real-time monitoring of tissue-specific 
promoter activity. We combined the split-luciferase complementation 
assay for detecting protein-protein interactions”’ and the AP1 complex, 
a heterodimer comprising Jun and Fos. The carboxy- and amino-terminal 
fragments of firefly luciferase (cLuc and nLuc) were fused to the carboxy 
terminus of A-Fos"', the Fos leucine zipper with amphipathic acidic ex- 
tension, and the c-Jun bZIP domain, respectively. (A-Fos)-cLUC (Ac) 
and (c-Jun bZIP domain)-nLUC (Jn) were then driven by tissue-specific 
and clock promoters, respectively (Fig. 3a). To spatiotemporally regulate 
the luciferase complementation, we used the TOC1 or CCA1 clock pro- 
moter and the SUCROSE-PROTON SYMPORTER 2 (SUC2) vasculature 
promoter to generate TOC1::Jn, CCA1::Jn and SUC2::Ac, respectively. 
Cauliflower mosaic virus (CaMV) 35S::Jn and CaMV35S::Ac were used 
as controls. These constructs were transformed into Arabidopsis, result- 
ing in the transgenic lines that we called CaMV35S/SUC2 TSLA, TOCI/ 
SUC2 TSLA, TOC1/CaMV35S TSLA, CCAI/SUC2 TSLA, and CCA1/ 
CaMV35S TSLA. Compared to TOC1::LUC and TOCI/CaMV35S TSLA, 
vasculature-specific luminescence was observed in 10-day-old TOC1/ 
SUC2 TSLA seedlings under 12-h light/12-h dark (L/D) conditions (Fig. 
3b-d). We also examined if the TSLA displayed rhythmic oscillations 
under free running conditions and confirmed that all lines tested except 
CaMV35S/SUC2 TSLA oscillated with around a 24-h period (Fig. 3e, fand 
Extended Data Fig. 7). The circadian phase of CCA1 was locked between 
CCA1/CaMV35S TSLA and CCA1/SUC2 TSLA, whereas for TOC1, 
TOC1/CaMV35S TSLA it was shifted earlier compared to TOC1/SUC2 
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Figure 3 | Tissue-specific luciferase assay (TSLA). a, Schematic drawings of 
the TSLA strategy. b-d, Luminescence images of TOC::LUC (b), TOC1/SUC2 
TSLA (c) and TOC1/CaMV35S TSLA (d) seedlings grown under L/D for 10 
days. Right panels show magnified cotyledons. Scale bars are 1 cm (left) and 
1mm (right). e, f, Real-time monitoring of the luminescence of 10-day-old 
TOC1/SUC2 TSLA #3 (m = 6) and TOCI/CaMV35S TSLA #3 (n = 12) 
seedlings (e), and CCA1/SUC2 TSLA #12 (n = 14) and CCA1/CaMV35S 
TSLA #2 (n = 12) seedlings (f) under L/D and free running conditions. 

Mean = s.d.; c.p.s., counts per second. Signals after subtraction of background 
noise are shown. 


TSLA. These results reconfirmed our conclusion that there are diver- 
gent properties of circadian clock regulation in the vasculature. 

The vasculature thus appears to have distinct gene expression dynamics, 
with characteristic circadian phases and regulatory targets. To test if 
the vasculature clock is robust in plants, we examined TOC1 expression 
in whole leaves and vasculature under L/D and free running conditions 
(Fig. 4a). The amplitude of TOC] oscillation under L/D was comparable 
between whole leaf and vasculature, the ratio between amplitude in the 
vasculature with respect to the amplitude in whole leaf being close to 1 
(Extended Data Fig. 8a). By contrast, when plants were in free running 
conditions, the amplitude of TOC1 in whole leaves damped rapidly at 
the third cycle, whereas a more persistent circadian rhythm was still 
maintained in the vasculature (Fig. 4a). Therefore, for every cycle under 
constant light conditions, the difference between the amplitudes in both 
tissues increased (Extended Data Fig. 8a). The robust circadian rhythm 
in the vasculature persisted for over one week. We also confirmed that 
the expression of other clock genes, such as CCAI and ELF4, is also 
robust in the vasculature (Extended Data Fig. 8b, c). 

To test for asymmetric regulation between tissue-specific clocks, we 
produced a transgenic line for which the vasculature clock was per- 
turbed by overexpression of CCA1-GFP driven by the SUC2 promoter 
(SUC2::CCA1). We crossed the SUC2::CCA1]1 line with the TOC1::LUC 
line, and observed a strong influence of the vasculature clock perturba- 
tion on the whole-leaf TOC1::LUC luminescence (Fig. 4b and Extended 
Data Fig. 8d), even though the RNA contribution ratio of vasculature is 
less than 10% (Fig. 1d and Extended Data Fig. le). We then monitored 
TOCI expression in isolated mesophyll and vasculature under free run- 
ning conditions. As shown in Fig. 4c, robust TOC] expression in wild- 
type vasculature was still observed, but it was weaker in whole leaves 
and mesophyll. When the vasculature clock was perturbed by SUC2:: 
CCA1 under the same conditions, TOC] expression was perturbed not 
only in vasculature but also in mesophyll, indicating the dominance of 
the vasculature for clock regulation in the mesophyll (Fig. 4c). We also 
used CHLOROPHYLL A/B BINDING PROTEIN 3 (CAB3)::CCAI1 
for mesophyll clock perturbation”*”’. In contrast to SUC2::CCA1, dys- 
function of the mesophyll circadian clock affected circadian rhythms 
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Figure 4 | The vasculature clock is robust and dominant to other clocks. 

a, TOC! expression in whole leaf and vasculature under L/D and continuous 
light free running conditions. Days 5 to 9 and day 12 are shown. Mean = s.e.m. 
(days 5-9, n = 3; and day 12, n = 4). b, Luminescence of TOC1::LUC (n = 22) 
and TOC1::LUC; SUC2::CCA1 #18 (n = 24) seedlings grown under L/D 

and continuous light free running conditions. Days 5 to 9 are shown. 

Mean + s.d. c, TOCI expression in whole leaf, mesophyll and vasculature from 
10-day-old wild-type, CAB3::CCA1 and SUC2::CCA1 seedlings. Plants were 
grown under L/D for 5 days and then transferred into free running conditions 
and analysed. Mean + s.e.m.; n = 3. d, Flowering time and FT expression 
analysis under long-day conditions. Mean = s.d.; n = 12. Promoters of 
3-KETOACYL-COA SYNTHASE 6 (CER6), UNUSUAL FLORAL ORGAN 
(UFO) and TERPENE SYNTHASE-LIKE SEQUENCE- 1,8-CINEOLE (TPS-CIN) 
were used as epidermis, shoot apical meristem, and hypocotyl/root promoters, 
respectively”. FT expression was detected at ZT16 of long-day grown 10-day- 
old seedlings. Mean + s.d.; n = 3. a, c, d, The gene expression was checked by 
qPCR. e, Our model proposes that the vasculature (phloem companion cells) 
clock and mesophyll clock asymmetrically affect each other in leaves. Through 
long- and short-distance signalling, the vasculature clock regulates the 
mesophyll clock and photoperiodic flowering. 


only in mesophyll, and TOC! expression in the vasculature still oscil- 
lated persistently. Thus, at least in this condition, asymmetric dom- 
inance of the vasculature clock over the mesophyll clock was revealed. 

Finally, we investigated whether the vasculature clock can affect a phy- 
siological response. In plants, the circadian clock and photoperiodism 
are tightly coupled, and many clock mutations affect photoperiodic flower- 
ing™*. We therefore generated a set of transgenic lines that express CCA1- 
GFP driven by different tissue-specific promoters that we had already 
tested in a previous study**”*”* (Extended Data Fig. 9). Among them, 
only CCA1::CCA1 and SUC2::CCA1 showed a late-flowering pheno- 
type under flowering-inductive long-day conditions (Fig. 4d). In addi- 
tion, the expression levels of FLOWERING LOCUS T (FT)°”’ were quite 
consistent with the flowering phenotypes (Fig. 4d). Hence, the vascula- 
ture clock regulates a whole plant physiological response by regulating 
the dynamics of FT (Fig. 4e). 

By combining two powerful tools for tissue-specific analysis—a rapid, 
direct tissue isolation method and the TSLA—we have been able to inves- 
tigate the tissue-specific regulation of the Arabidopsis circadian clock 
system. 

We have demonstrated that the vasculature clock system is distinct 
and robust; moreover, it is able to control neighbouring mesophyll cell 
gene expression and a physiological response. In that sense, the vascu- 
lature and mesophyll clocks in Arabidopsis constitute a layered clock 
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system such as central and peripheral clocks in mammals'”, or evening 
cells and morning cells in Drosophila’® (Fig. 4e). 

Our findings can explain specific functions of the clock in vascula- 
ture and mesophyll, but additional tissue-specific analysis with high 
spatiotemporal resolution will be required to elucidate the contribu- 
tions of as-yet undefined clock genes to the robustness and sensitivity 
of the hierarchical circadian clock circuitry that we have uncovered. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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Given the global burden of diarrhoeal diseases’, it is important to 
understand how members of the gut microbiota affect the risk for, 
course of, and recovery from disease in children and adults. The acute, 
voluminous diarrhoea caused by Vibrio cholerae represents a dra- 
matic example of enteropathogen invasion and gut microbial commu- 
nity disruption. Here we conduct a detailed time-series metagenomic 
study of faecal microbiota collected during the acute diarrhoeal and 
recovery phases of cholera in a cohort of Bangladeshi adults living in 
an area with a high burden of disease”. We find that recovery is char- 
acterized by a pattern of accumulation of bacterial taxa that shows 
similarities to the pattern of assembly/maturation of the gut microbi- 
ota in healthy Bangladeshi children’. To define the underlying mech- 
anisms, we introduce into gnotobiotic mice an artificial community 
composed of human gut bacterial species that directly correlate with 
recovery from cholera in adults and are indicative of normal micro- 
biota maturation in healthy Bangladeshi children’. One of the spe- 
cies, Ruminococcus obeum, exhibits consistent increases in its relative 
abundance upon V. cholerae infection of the mice. Follow-up analyses, 
including mono- and co-colonization studies, establish that R. obeum 
restricts V. cholerae colonization, that R. obeum luxS (autoinducer-2 
(AI-2) synthase) expression and AI-2 production increase significantly 
with V. cholerae invasion, and that R. obeum AI-2 causes quorum- 
sensing-mediated repression of several V. cholerae colonization fac- 
tors. Co-colonization with V. cholerae mutants discloses that R. obeum 
AI-2 reduces Vibrio colonization/pathogenicity through a novel path- 
way that does not depend on the V. cholerae AI-2 sensor, LuxP. The 
approach described can be used to mine the gut microbiota of Ban- 
gladeshi or other populations for members that use autoinducers 
and/or other mechanisms to limit colonization with V. cholerae, or 
conceivably other enteropathogens. 

We used an approved protocol for recruiting Bangladeshi adults liv- 
ing in Dhaka Municipal Corporation area for this study. Of the 1,153 
patients with acute diarrhoea who were screened, seven passed all entry 
criteria (Methods) and were enrolled (Supplementary Tables 1 and 2). 
Faecal samples collected at monthly intervals during the first 2 post- 
natal years from 50 healthy children living in the Mirpur area of Dhaka 
city, plus samples obtained at approximately 3-month intervals over a 
1-year period from 12 healthy adult males also living Mirpur, allowed 
us to compare recovery of the microbiota from cholera with the nor- 
mal process of assembly of the gut community in infants and children, 
and with unperturbed communities from healthy adult controls. 

Using the standard treatment protocol of the International Centre for 
Diarrhoeal Disease Research, Bangladesh, study participants with acute 
cholera received a single oral dose of azithromycin and were given oral 
rehydration therapy for the duration of their hospital stay. Patients were 
discharged after their first solid stool. We divided the diarrhoeal period 
(from the first diarrhoeal stool after admission to the first solid stool) 
into four proportionately equal time bins: diarrhoeal phase 1 (D-Ph1) 


to D-Ph4. Every diarrhoeal stool was collected from every participant. 
Faecal samples were also collected every day for the first week after dis- 
charge (recovery phase 1, R-Ph1), weekly during the next 3 weeks (R- 
Ph2), and monthly for the next 2 months (R-Ph3). For each individual, 
we selected a subset of samples from D-Ph1 to D-Ph3 (Methods), plus 
all samples from D-Ph4 to R-Ph3, for analysis of bacterial composition 
by sequencing PCR amplicons generated from variable region 4 (V4) 
of the 16S ribosomal RNA (rRNA) gene (Supplementary Information, 
Extended Data Fig. 1a and Supplementary Table 3). Reads sharing 97% 
nucleotide sequence identity were grouped into operational taxonomic 
units (97%-identity OTUs; Methods). 

We identified a total of 1,733 97%-identity OTUs assigned to 343 dif- 
ferent species after filtering and rarefaction (Methods). V. cholerae dom- 
inated the microbiota of the seven patients with cholera during D-Ph1 
(mean maximum relative abundance 55.6%), declining markedly within 
hours after initiation of oral rehydration therapy. The microbiota then 
became dominated by either an unidentified Streptococcus species (maxi- 
mum relative abundance 56.2-98.6%) or by Fusobacterium species (19.4- 
65.1% in patients B-E). In patient G, dominance of the community passed 
from a Campylobacter species (58.6% maximum) to a Streptococcus spe- 
cies (98.6% maximum) (Supplementary Table 4). Of the 343 species, 
47.9 + 6.6% (mean + s.d.) were observed throughout both the diarrhoeal 
and recovery phases, suggesting that microbiota composition during the 
recovery phase may reflect an outgrowth from reservoirs of bacteria re- 
tained during disruption by diarrhoea (Extended Data Fig. 2a—d and Sup- 
plementary Information). 

Indicator species analysis* (Methods) was used to identify 260 bacte- 
rial species consistently associated with the diarrhoeal or recovery phases 
across members of the study group, and in a separate analysis for each 
subject (Supplementary Table 5). The relative abundance of each of the 
discriminatory species in each faecal sample was compared with the 
mean weighted phylogenetic (UniFrac*) distance between that micro- 
biota sample and all microbiota samples collected from the reference 
cohort of healthy Bangladeshi adults. The results revealed 219 species 
with significant indicator value assignments to diarrhoeal or recovery 
phases, and relative abundances with statistically significant Spearman’s 
rank correlation values to community UniFrac distance to healthy con- 
trol microbiota (Supplementary Table 6 and Extended Data Fig. 2d). 
Not surprisingly, the abundance of V. cholerae directly correlated with 
increased distance to a healthy microbiota. Streptococcus and Fusobac- 
terium species, which bloomed during the early phases of diarrhoea, 
were also significantly and positively correlated with distance from a 
healthy adult microbiota. Increases in the relative abundances of spe- 
cies in the genera Bacteroides, Prevotella, Ruminococcus/Blautia, and 
Faecalibacterium (for example, Bacteroides vulgatus, Prevotella copri, 
Robeum, and Faecalibacterium prausnitzii) were strongly correlated with 
a shift in community structure towards a healthy adult configuration 
(Extended Data Fig. 2d and Supplementary Table 6). 
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Previously we used Random Forests, a machine-learning algorithm, 
to identify a collection of age-discriminatory bacterial taxa that together 
define different stages in the postnatal assembly/maturation of the gut 
microbiota in healthy Bangladeshi children living in the same area as 
the adult patients with cholera’. Of those 60 most age-discriminatory 
97%-identity OTUs representing 40 different species, 31 species were 
present in adult patients with cholera. Intriguingly, they followed a sim- 
ilar progression of changing representation during diarrhoea to recovery 
as they do during normal maturation of the healthy infant gut micro- 
biota (Extended Data Fig. 2d). Twenty-seven of the 31 species were sig- 
nificantly associated with recovery from diarrhoea by indicator species 
analysis (see Supplementary Information and Extended Data Figs 3-5 
for OTU-level and community-wide analyses). These 27 species, which 
serve as indicators and are potential mediators of restoration of the gut 
microbiota after cholera, guided construction of a gnotobiotic mouse 
model that examined the molecular mechanisms by which some of these 
taxa might affect V. cholerae infection and promote restoration. 

We assembled an artificial community of 14 sequenced human gut 
bacterial species (Supplementary Table 7) that included (1) five species 
that directly correlated with gut microbiota recovery from cholera and 
with normal maturation of the infant gut microbiota (R. obeum, Rum- 
inococcus torques, F. prausnitzii, Dorea longicatena, Collinsella aerofa- 
ciens), (2) six species significantly associated with recovery from cholera 
by indicator species analysis (Bacteroides ovatus, Bacteroides vulgatus, 
Bacteroides caccae, Bacteroides uniformis, Parabacteroides distasonis, 
Eubacterium rectale), and (3) three prominent members of the adult 
human gut microbiota that have known capacity to process dietary and 
host glycans (Bacteroides cellulosilyticus, Bacteroides thetaiotaomicron, 
Clostridium scindens**; as noted in Extended Data Fig. 6 and Supplemen- 
tary Table 8, shotgun sequencing of diarrhoeal- and recovery-phase human 
faecal DNA samples revealed that genes encoding enzymes involved in 
carbohydrate metabolism were the largest category of identified genes 
specifying known enzymes that changed in relative abundance within 
the faecal microbiome during the course of cholera). One group of mice 
was directly inoculated with approximately 10° colony-forming units 
(c.f.u.) of V. cholerae at the same time they received the 14-member 
community to simulate the rapidly expanding V. cholerae population 
during diarrhoea (‘Dlinvasion’ group). A separate group was gavaged 
with the community alone and then invaded 14 days later with V. chol- 
erae (‘D14invasion’ group) (Extended Data Fig. Ic). 

V. cholerae levels remained at a high level in the Dlinvasion group 
over the first week (maximum 46.3% relative abundance), and then de- 
clined rapidly to low levels (<1%). Introduction of V. cholerae into the 
established 14-member community produced much lower levels of 
V. cholerae infection (range of mean abundances measured daily over 
the 3 days after gavage of the enteropathogen, 1.2-2.7%; Supplemen- 
tary Table 9). Control experiments demonstrated that V. cholerae was 
able to colonize at high levels for at least 7 days when it was introduced 
alone into germ-free recipients (10°-10'° c.f.u. per milligram wet weight 
of faeces; Fig. 1a). Together, these data suggest that a member or mem- 
bers of the artificial human gut microbiota had the ability to restrict 
V. cholerae colonization. 

Changes in relative abundances of the 14 community members in fae- 
cal samples in response to V. cholerae were consistent for most species 
across the Dlinvasion and D14invasion mice (Supplementary Table 9). 
We focused on one member, R. obeum, because its relative abundance 
increased significantly after introduction of V. cholerae in both the 
Dlinvasion and D14invasion groups (Extended Data Fig. 7a and Sup- 
plementary Table 9) and because it is a prominent age-discriminatory 
taxon in the Random Forests model of gut microbiota maturation in 
healthy Bangladeshi children’ (Extended Data Fig. 4b). Mice were mono- 
colonized with either R. obeum or V. cholerae for 7 days and then the 
other species was introduced (Extended Data Fig. 1d). When R. obeum 
was present, V. cholerae levels declined by 1-3 logs (Fig. 1a). Germ-free 
mice were also colonized with the defined 14-member community or the 
same community without R. obeum for 2 weeks, and V. cholerae was 
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Figure 1 | R. obeum restricts V. cholerae colonization in adult gnotobiotic 
mice. a, V. cholerae levels in the faeces of mice colonized with the indicated 
human gut bacterial species (n = 4-6 mice per group). b, Expression of 

R. obeum luxS AI-2 synthase in the 14-member community 4 days after 
introduction of 10° c.f.u. of V. cholerae or no pathogen (n = 5 mice per group). 
Note that D. longicatena levels fall precipitously after V. cholerae invasion 
(Supplementary Table 9). Mean values + s.e.m. are shown. ND, not detected. 
*P < 0.05, **P < 0.01 (unpaired Mann-Whitney U-test). 


then introduced by gavage (Extended Data Fig. le). V. cholerae levels 
1 day after gavage were 100-fold higher in the community that lacked 
R. obeum; these differences were sustained over time (50-fold higher 
after 7 days; P< 0.01, unpaired Mann-Whitney U-test; Fig. 1a). 

Having established that R. obeum restricts V. cholerae colonization, 
we used microbial RNA sequencing (RNA-seq) of faecal RNAs to deter- 
mine the effect of R. obeum on expression of known V. cholerae viru- 
lence factors in mono- and co-colonized mice. Co-colonization led to 
reduced expression of tcpA (a primary colonization factor in humans”"*), 
rtxA and hlyA (encode accessory toxins'’"*), and VC1447-VC1448 (RtxA 
transporters) (threefold to fivefold changes; P< 0.05 compared with 
V. cholerae mono-colonized controls, Mann-Whitney U-test; see Sup- 
plementary Information and Supplementary Table 10 for other regu- 
lated genes that could impact colonization, plus Extended Data Fig. 8 
for an ultra-performance liquid chromatography mass spectrometry 
(UPLC-MS) analysis of bile acids reported to effect V. cholerae gene 
regulation’). 

Two quorum-sensing pathways are known to regulate V. cholerae 
colonization/virulence'*”: an intra-species mechanism involving cholera 
autoinducer-1, and an inter-species mechanism involving autoinducer-2 
(refs 18, 19). Quorum sensing disrupts expression of V. cholerae viru- 
lence determinants through a signalling pathway that culminates in 
production of the LuxR-family regulator HapR'*’’. Repression of quo- 
rum sensing in V. cholerae is important for virulence factor expression 
and infection””**. The luxS gene encodes the S-ribosylhomocysteine 
lyase responsible for AI-2 synthesis. Homologues of luxS are widely dis- 
tributed among bacteria’*””, including 8 of the 14 species in the artificial 
human gut community (Supplementary Table 11 and Extended Data 
Fig. 9). RNA-seq of the faecal meta-transcriptomes of Dlinvasion mice 
colonized with the 14-member artificial community plus V. cholerae, 
and mice harbouring the 14-member consortium without V. cholerae, 
revealed that of predicted JuxS homologues in the community, only ex- 
pression of R. obeum luxS (RUMOBE02774) increased significantly in 
response to V. cholerae (P < 0.05, Mann-Whitney U-test; Fig. 1b). More- 
over, R. obeum luxS transcript levels directly correlated with V. cholerae 
levels (Extended Data Fig. 7c). 

In addition to luxS, the R. obeum strain represented in the artificial 
community contains homologues of IsrABCK that are responsible for 
import and phosphorylation of AI-2 in Gram-negative bacteria”’, as well 
as homologues of two genes, luxR and luxQ, that play a role in AI-2 sens- 
ing and downstream signalling in other organisms™. Expression of all 
these R. obeum genes was detected in vivo, consistent with R. obeum 
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having a functional AI-2 signalling system (Extended Data Fig. 7b). (See 
Supplementary Information for results showing that R. obeum AI-2 
production is stimulated by V. cholerae in vitro and in co-colonized 
animals (Extended Fig. 7d-f), plus (1) a genome-wide analysis of the 
effects of V. cholerae on R. obeum transcription in co-colonized mice 
(Supplementary Table 10c) and (2) a community-wide view of the tran- 
scriptional responses of the 14-member consortium to V. cholerae (Sup- 
plementary Table 12).) 

Quorum sensing downregulates the V. cholerae tcp operon that en- 
codes components of the toxin co-regulated pilus (TCP) biosynthesis 
pathway required for infection of humans””’. To confirm that R. obeum 
LuxS could signal through AI-2 pathways, we cloned R. obeum and V. 
cholerae luxS downstream of the arabinose-inducible Pg4p promoter 
in plasmids that were maintained in an Escherichia coli strain unable 
to produce its own AI-2 (DH5a)”*. High tcp expression can be induced 
in V. cholerae after slow growth in AKI medium without agitation fol- 
lowed by rapid growth under aerobic conditions”*. Addition of culture 
supernatants harvested from the E. coli strains expressing R. obeum or 
V. cholerae luxS caused a two- to threefold reduction in tcp induction 
in V. cholerae (P< 0.05, unpaired Student’s t-test; replicated in four 
independent experiments). Supernatants from a control E. coli strain 
with the plasmid vector lacking /uxS had no effect (Fig. 2a). These find- 
ings are consistent with our in vivo RNA-seq results and provide direct 
evidence that R. obeum AI-2 regulates expression of V. cholerae viru- 
lence factor. 

Germ-free mice were then colonized with V. cholerae and E. coli bear- 
ing either the Pgap-R. obeum luxS plasmid or the vector control. Mice 
that received E. coli expressing R. obeum luxS showed a significantly 
lower level of V. cholerae colonization 8 h after gavage than mice that 
received E. coli with vector alone (Fig. 2b; there was no statistically sig- 
nificant difference in levels of E. coli between the two groups (data not 
shown)). Together, these results establish a direct causal relationship 
between R. obeum-mediated restriction of V. cholerae colonization and 
R. obeum AI-2 synthesis. 

Several V. cholerae mutants were used to determine whether known 
V. cholerae AI-2 signalling pathways are required for the observed ef- 
fects of R. obeum on V. cholerae colonization. LuxP is critical for sens- 
ing AI-2 in V. cholerae. Co-colonization experiments in gnotobiotic mice 
revealed that levels of isogenic AluxP or wild-type IuxP” V. cholerae 
strains were not significantly different as a function of the presence of 
R. obeum (Extended Data Fig. 10a), suggesting that R. obeum modu- 
lates V. cholerae levels through other quorum-sensing regulatory genes. 
The IuxO and hapR genes encode central regulators linking known V. 
cholerae quorum-signalling and virulence regulatory pathways. Dele- 
tion of luxO typically results in increased hapR expression’. However, 
our RNA-seq analysis had shown that both luxO and hapR are repressed 
in the presence of R. obeum (six- to sevenfold, P< 0.0001; Mann- 
Whitney U-test), as are two important downstream activators of viru- 
lence repressed by HapR"*, encoded by aphA and aphB. These findings 
provide additional evidence that R. obeum operates to regulate viru- 
lence through a novel regulatory pathway. 

The quorum-sensing transcriptional regulator VqmA was upregulated 
more than 25-fold when V. cholerae was introduced into mice mono- 
colonized with R. obeum (Fig. 2c and Supplementary Table 10). When 
germ-free mice were gavaged with R. obeum and a mixture of AvgmA 
(AlacZ)”’ and wild-type V. cholerae (lacZ* ) strains, the AvqmA mutant 
exhibited an early competitive advantage (Fig. 2d), suggesting that R. 
obeum may be able to affect early colonization of V. cholerae through 
VqmA. Vqm4A is able to bind to and activate the hapR promoter directly”. 
Since RNA-seq showed that hapR activation did not occur in gnoto- 
biotic mice despite high levels of vgmA expression (Extended Data Fig. 
10b and Supplementary Table 10), we postulate that the role played by 
VqmA in R. obeum modulation of Vibrio virulence genes involves an 
uncharacterized mechanism rather than the known pathway passing 
through HapR. 
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Figure 2 | R. obeum AI-2 reduces V. cholerae colonization and virulence 
gene expression. a, R. obeum AI-2 produced in E. coli represses the tcp 
promoter in V. cholerae (triplicate assays; results representative of four 
independent experiments). b, Faecal V. cholerae levels in gnotobiotic mice 
8h after gavage with V. cholerae and an E. coli strain containing either the 
Pgap-R. obeum luxS plasmid or vector control. c, Faecal vgmA transcript 
abundance in mono- or co-colonized mice. d, Competitive index of AvgmA 
versus wild-type V. cholerae during co-colonization with R. obeum (n= 5 
animals per group). Mean values + s.e.m. are shown. *P < 0.05, **P < 0.01, 
****D < 0.0001 (unpaired two-tailed Student’s f-test). 


We have identified a set of bacterial species that strongly correlate 
with a process in which the perturbed gut bacterial community in adult 
patients with cholera is restored to a configuration found in healthy Ban- 
gladeshi adults. Several of these species are also associated with the nor- 
mal assembly/maturation of the gut microbiota in Bangladeshi infants 
and children, raising the possibility that some of these taxa may be use- 
ful for ‘repair’ of the gut microbiota in individuals whose gut communities 
have been ‘wounded’ through a variety of insults, including enteropatho- 
gen infections. Translating these observations to a gnotobiotic mouse 
model containing an artificial human gut microbiota composed of 
recovery- and age-indicative taxa established that one of these species, 
R. obeum, reduces V. cholerae colonization. As an entrenched member 
of the gut microbiota in Bangladeshi individuals, R. obeum could func- 
tion to increase median infectious dose (IDs 9) for V. cholerae in humans 
and thus help to determine whether exposure to a given dose of this en- 
teropathogen results in diarrhoeal illness. The modest effects of R. obeum 
AI-2 on V. cholerae virulence gene expression in our adult gnotobiotic 
mouse model may reflect the possibility that we have only identified a 
small fraction of the microbiota’s full repertoire of virulence-suppressing 
mechanisms. Culture collections generated from the faecal microbiota 
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of Bangladeshi subjects are a logical starting point for ‘second-generation’ 
artificial communities containing R. obeum isolates that have evolved 
in this population, and for testing whether the observed effects of R. 
obeum generalize across many different strains from different popula- 
tions. Moreover, the strategy described in this report could be used to 
mine the gut microbiota of Bangladeshi or other populations where di- 
arrhoeal disease is endemic for additional species that use quorum- 
related and/or other mechanisms to limit colonization by V. cholerae 
and potentially other enteropathogens. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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Invasion of host erythrocytes is essential to the life cycle of Plasmo- 
dium parasites and development of the pathology of malaria. The 
stages of erythrocyte invasion, including initial contact, apical reori- 
entation, junction formation, and active invagination, are directed 
by coordinated release of specialized apical organelles and their par- 
asite protein contents’. Among these proteins, and central to invasion 
by all species, are two parasite protein families, the reticulocyte-binding 
protein homologue (RH) and erythrocyte-binding like proteins, which 
mediate host-parasite interactions’. RH5 from Plasmodium falci- 
parum (PfRHS5) is the only member of either family demonstrated 
to be necessary for erythrocyte invasion in all tested strains, through 
its interaction with the erythrocyte surface protein basigin (also known 
as CD147 and EMMPRIN)**. Antibodies targeting P{RH5 or basi- 
gin efficiently block parasite invasion in vitro*°, making Pf{RH5 an 
excellent vaccine candidate. Here we present crystal structures of 
PfRH5 in complex with basigin and two distinct inhibitory antibodies. 
PfRH5 adopts a novel fold in which two three-helical bundles come 
together in a kite-like architecture, presenting binding sites for basi- 
gin and inhibitory antibodies at one tip. This provides the first struc- 
tural insight into erythrocyte binding by the Plasmodium RH protein 
family and identifies novel inhibitory epitopes to guide design of a 
new generation of vaccines against the blood-stage parasite. 

Each Plasmodium species contains at least one RH protein. These are 
often large, of low sequence complexity and with no homology to pro- 
teins of known structure. PfRH5 is unusual in being significantly shorter 
than its homologues (~60 kDa for PfRH5 vs 200-375 kDa for other RH 
proteins). It lacks their carboxy-terminal transmembrane segment, but 
associates peripherally with the membrane and with PfRH5 interacting 
protein (PfRipr)"°. Although it shares only ~20% pairwise sequence iden- 
tity with other PfRH proteins*"’, PfRH5 is remarkably conserved, with 
only five common non-synonymous single nucleotide polymorphisms 
(SNPs)’*””. Crucially, antibodies raised against one PfRH5 variant neu- 
tralise parasites of all tested heterologous strains, containing these and 
other less common SNPs**”, and anti-PfRH5 monoclonal antibodies that 
prevent parasite growth in vitro can directly block the PfRH5-basigin 
interaction’. Moreover, acquisition of anti-PfRH5 antibodies during 
natural infection correlates with clinical outcome and these antibodies 
can also inhibit parasite growth in vitro'’. These findings have gener- 
ated intense excitement about PfRH5 as a next-generation blood-stage 
malaria vaccine target and emphasized the need for structural informa- 
tion to guide rational immunogen design. 

Structural studies of PfRH5 required a protein construct lacking flex- 
ible regions but still capable of binding basigin. Long disordered regions 
were predicted within residues 1-140 and 248-296 (Extended Data 
Fig. 1a), and in cultured parasite lines, PfRH5 is processed by removal 
of the amino terminus to generate a ~45 kDa fragment*"®. We there- 
fore designed PfRH5ANL, encompassing residues 140-526 but lacking 
248-296, and showed that it binds basigin by surface plasmon resonance 


with an affinity of 1.3 uM (Fig. 1c), comparable to the affinity of full- 
length PfRH5 for basigin (1.1 uM)’. 

To ensure that P(RH5ANL contains epitopes required to elicit an in- 
hibitory immune response, we raised rabbit polyclonal IgG and tested 
their ability to neutralise parasites by a growth-inhibitory activity (GIA) 
assay (Fig. 1d). IgG raised against PERH5ANL protein showed a potent 
inhibitory effect, similar to that of IgG raised by immunisation of rabbits 
with viral vectors expressing full-length PfRH5°, or full-length PfRH5 
recombinant protein®”. We also tested binding of PERH5ANL toa panel 
of mouse monoclonal antibodies previously characterized for PfRH5 
binding and growth-inhibitory activity’. PERHS5ANL bound to growth- 
inhibitory antibodies including QA1, QA5 and 9AD4, but not to non- 
inhibitory 4BA7 and RB3 (Extended Data Fig. 2). Thus, PFRHSANL 
induces a growth-inhibitory immune response, and contains the epi- 
topes targeted by inhibitory antibodies. 

For structural studies, PERH5ANL was mixed with basigin or frag- 
ments of growth-inhibitory monoclonal antibodies, 9AD4 or QA1. The 
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Figure 1 | The structure of PfRH5. a, Three views of P(RH5ANL (from the 
PfRH5ANL-9AD4 structure) and a schematic topology diagram, coloured as a 
rainbow from blue (N terminus) to red (C terminus). Disulphide bonds 

are indicated on the topology diagram by red lines. b, PERH5ANL structure 
docked into a SAXS envelope of full-length PfRH5. c, Surface plasmon 
resonance analysis of the PPRH5ANL-basigin interaction. RU, response 
units. d, In vitro growth inhibition activity (GIA) of IgG from rabbits 
immunised with PfRH5ANL against 3D7 (red) and 7G8 (blue) P. falciparum 
strains. The error bars are standard error of the mean (n = 3). 
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Figure 2 | The structure of the PfRH5-basigin complex. a, The structure of 
PfRH5ANL (yellow) bound to basigin (blue). b, A top view of the PTRHSANL- 
basigin complex showing the two conformations of basigin (blue and cyan) 
found in the asymmetric unit, aligned on PIRH5ANL. ¢, Equilibrium analytical 
ultracentrifugation analysis of PfRH5-basigin indicating a 1:1 complex. 


complexes were trimmed using endoproteinase GluC and lysines chem- 
ically methylated before crystallization. Crystals formed and data were 
collected to 3.1 A (PfRHSANL-basigin), 2.3 A (PFRHSANL-9AD4) and 
3.1 A resolution (PFRHS5ANL-QA1). Structures were determined using 
molecular replacement (Extended Data Table 1). 

PfRH5 adopts a rigid, flat, ‘kite-shaped’ architecture with a pseudo- 
twofold rotation symmetry and no similarity to known structures (Fig. 1a). 
Each half is predominantly built from a three-helical bundle, with the 
outermost helices containing significant kinks or breaks. The N-terminal 
half begins with a short, two-stranded B-sheet that crosses the long axis 
of the kite at its centre. This is followed by a single, short helix and two 
long, kinked helices connected by the truncated loop (containing 58 res- 
idues in full-length PfRH5). The C-terminal half is simpler, consisting 
of three long helices that span the entire length of the domain and fin- 
ishing with a flexible C terminus. One disulphide bond (C345-C351) 
stabilizes the loop that links the two halves of the structure, while an- 
other links the second and third helices (C224-—C317), leaving one un- 
paired cysteine (C329). 

PfRHS is predominantly rigid, with five copies in the three different 
crystal forms aligning with an r.m.s.d. of 0.9 A over 95% of residues (Ex- 
tended Data Fig. 1b). Only the C terminus (residues 496-end) and the 
loop linking helices 4 and 5 (residues 396-406) adopt different posi- 
tions in different crystal forms. A molecular envelope derived from small 
angle X-ray scattering (SAXS) analysis of full-length PfRH5 in solution 
exhibits a similar flat structure (Fig. 1b, Extended Data Fig. 3). This en- 
velope is elongated relative to PPRH5ANL, most probably owing to res- 
idues missing in this construct or not ordered in the crystal structure 
(22 residues at the C terminus, the flexible loop, and perhaps part of the 
extended N terminus). 

As members of the Plasmodium RH family share little sequence iden- 
tity, sequence alignments and structure-based threading were used to 
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d, Close-up of the PfRH5-basigin binding site. Basigin residues in the 
N-terminal domain (pink), the linker (His 102, orange stick), and the 
C-terminal domain (green) contact PfRH5 (grey surface). In the alternative 
basigin conformation in the asymmetric unit, the yellow loop contacts PfRH5. 


predict whether other members contain the PfRH5 fold. In each pro- 
tein analysed (P. falciparum RH1, RH2a, RH2b, RH3 and RH4; Plas- 
modium vivax RBP-1 and RBP-2; Plasmodium reichenowi RH5; and 
Plasmodium yoelii Py01365), N-terminal PfRH5-like domains were iden- 
tified with high confidence, despite sequence identities of 14-22% and 
a lack of totally conserved residues or disulphide bonds (Extended Data 
Fig. 4a). Similar residues are located primarily in the interior of the 
domain, where they may stabilize the fold (Extended Data Fig. 4b). In 
PfRH4, the only other RH protein with a known erythrocyte receptor, 
the complement receptor 1 (CR1) binding fragment contains the puta- 
tive PfRH5 fold’*. These PfRH5-like domains are therefore excellent 
candidates for ligand-binding modules in other RH proteins. 

Basigin binds at the tip of PfRH5, distant from the flexible loop and 
C terminus, with both domains and the intervening linker directly con- 
tacting PfRH5 (Fig. 2a, d). Most of the contact area (~ 1,350 A?) occurs 
through hydrogen bonds between the backbone of strands A and G of 
the basigin N-terminal domain and loops at the tip of PfRH5 (Extended 
Data Table 2a). PfRH5 residues F350 and W447 stabilize this interaction 
by packing into hydrophobic pockets on basigin. The limited involve- 
ment of basigin side chains will reduce the potential for basigin escape 
mutants that prevent PfRH5 binding and impair parasite invasion. 

The basigin C-terminal domain and H102 in the linker also directly 
contact PfRH5 (Extended Data Table 2a). The three loops at the tip of 
the basigin C-terminal domain (linking strands B and C, strands D and 
E and strands F and G) interact with the second and fourth helices of 
PfRH5 through hydrogen bonds and a hydrophobic patch contributed 
by residues VPP from the BC loop. However, flexibility of the basigin 
linker allows different orientations of the C-terminal domains in the two 
copies in the asymmetric unit of the crystal. Chain B interacts through 
the BC and DE loops (a ~650 A? interface) while chain D interacts 
through the BC and FG loops (~480 A”) (Fig. 2b), leading to a maximum 
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Figure 3 | Structural analysis of binding of invasion-inhibitory antibody 
fragments to PfRH5. a, Crystal structures of PFRH5ANL (yellow) bound to 
inhibitory antibody fragments QA1 (red) and 9AD4 (green). Close-up views 
of the PfRH5 epitopes (red) are shown with antibodies as grey surfaces. 

b, Top view of PERH5ANL-9AD4 crystal structure with superimposed basigin 


difference of ~18 A in the position of the C terminus in the two com- 
plexes. Flexibility is also predicted from SAXS analysis of the com- 
plex in solution (Extended Data Fig. 3). While PfRH5 and the basigin 
N-terminal domain fit the SAXS envelope, the C-terminal domain only 
partially fits, consistent with a flexible interaction with PfRHS5. 

PfRH5S is highly conserved, with just twelve non-synonymous SNPs 
found in 227 field isolates, and only five at frequencies of 10% or 
greater’*’. These SNPs are distributed across the structure, but do not 
affect residues that directly contact basigin (Extended Data Fig. 5). By 
contrast, in sequenced laboratory strains, eight PfRH5 SNPs are assoc- 
iated with increased ability to invade Aotus erythrocytes'*"*. A number of 
these (1204, N347, Y358 and E362) are in or close to the basigin binding 
site, and may affect host tropism. Basigin residues which, when mutated, 
affect PfRH5 affinity (F27 and Q100)” are also located at the interface. 

The two PfRH5-basigin complexes in the asymmetric unit pack to- 
gether through basigin-mediated contacts, including a ~911 A? interface 
between the two basigin C-terminal domains, bringing their C termini 
into close proximity (Extended Data Fig. 6). As yet, the role of PFRH5 
in invasion is uncertain, but it is tempting to speculate that this 2:2 com- 
plex assembles during invasion, mediating a signalling event in either 
parasite or erythrocyte to trigger an essential downstream process. This 
would leave one face of PfRH5 available for binding of PfRipr’® and other, 
as yet unidentified, binding partners. However, in solution (at concen- 
trations = 24 [1M) we observe no 2:2 complex, either through SAXS (Ex- 
tended Data Fig. 3) or analytical ultracentrifugation (Fig. 2c, Extended 
Data Fig. 7). Whether such a complex assembles at high local concen- 
trations during invasion remains to be elucidated. 

To identify inhibitory epitopes, complexes of PERH5ANL with Fab 
fragments from three inhibitory monoclonal antibodies were studied 
by crystallography and SAXS. QA1 and QA5 were previously shown to 
block PfRH5-basigin binding and parasite growth. 9AD4 does not block 
PfRH5-basigin binding in vitro, but is one of the most effective anti- 
bodies currently available for inhibiting parasite growth’. Crystal struc- 
tures of PfRH5 bound to QA1 and 9AD4 were confirmed by SAXS, while 
a model for PERH5-QAS5 was derived from SAXS analysis, guided by a 
previously identified linear epitope (residues 201-213 from helix 2)”. 

The antibodies bind to three distinct sites, close to the vertex of PfRH5 
(Fig. 3, Extended Data Fig. 8, Extended Data Table 2b, c). QA1 binds 
to loops at the PfRH5 tip, overlapping the basigin N-terminal domain 


BSG-N 


BSG-C 


QAS 
(BSG; blue) aligned on PfRH5. c, Top view of a model of PfRH5-QAS, in a 
SAXS-derived envelope, with the putative QA5 epitope highlighted red’. 

d, Schematic showing binding sites for the N- and C-terminal domains of 


basigin (BSG-N and BSG-C; blue), QA1 (red), 9AD4 (green) and QA5 (cyan), 
on the structure of PPRH5ANL. 


binding site. QA5 predominantly interacts with PfRH5 helix 2, over- 
lapping the basigin C-terminal domain binding site. In contrast, 9AD4 
binds helices 2 and 3, close to, but not overlapping, either basigin bind- 
ing site. This is likely to allow intact 9AD4 IgG to impede erythrocyte 
binding when PfRHS5 and basigin are both membrane-tethered. This 
reveals inhibitory epitopes in or close to the basigin binding sites that 
can be targeted to block parasite invasion. 

In summary, PfRH5 adopts a novel architecture formed, as in many 
families of parasite surface proteins’®, from a robust «-helical scaffold. 
This maintains the overall fold by retaining residues required for helical 
packing, while allowing significant surface sequence variation. Sequence 
homology identifies this fold at the N terminus of other RH proteins, 
where it is likely to act as a ligand-binding module. Characterization of 
the PfRH5-basigin complex prompts a range of future experiments 
to investigate the role of PfRH5 in erythrocyte invasion. Furthermore, 
monoclonal antibodies that block parasite growth bind at or close to 
the basigin-binding site. Immunogens containing these regions of PfRH5 
will be important components of a vaccine to prevent P. falciparum 
erythrocyte invasion, thereby crippling the parasite responsible for the 
deadliest form of human malaria. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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Ischaemic accumulation of succinate controls 
reperfusion injury through mitochondrial ROS 


Edward T. Chouchani!**, Victoria R. Pell?*, Edoardo Gaude?, Dunja Aksentijevic’, Stephanie Y. Sundier®, Ellen L. Robb', 

Angela Logan’, Sergiy M. Nadtochiy®, Emily N. J. Ord’, Anthony C. Smith’, Filmon Eyassu', Rachel Shirley’, Chou-Hui Hu’, 
Anna J. Dare’, Andrew M. James!, Sebastian Rogatti!, Richard C. Hartley®, Simon Eaton’, Ana S. H. Costa’, Paul S. Brookes*, 
Sean M. Davidson'®, Michael R. Duchen®, Kourosh Saeb-Parsy", Michael J. Shattock*, Alan J. Robinson', Lorraine M. Work’, 


Christian Frezza*, Thomas Krieg? & Michael P. Murphy! 


Ischaemia-reperfusion injury occurs when the blood supply to an 
organ is disrupted and then restored, and underlies many disorders, 
notably heart attack and stroke. While reperfusion of ischaemic tissue 
is essential for survival, it also initiates oxidative damage, cell death and 
aberrant immune responses through the generation of mitochondrial 
reactive oxygen species (ROS)' >. Although mitochondrial ROS pro- 
duction in ischaemia reperfusion is established, it has generally been 
considered a nonspecific response to reperfusion’*. Here we develop 
a comparative in vivo metabolomic analysis, and unexpectedly identify 
widely conserved metabolic pathways responsible for mitochondrial 
ROS production during ischaemia reperfusion. We show that selec- 
tive accumulation of the citric acid cycle intermediate succinate is a 
universal metabolic signature of ischaemia in a range of tissues and 
is responsible for mitochondrial ROS production during reperfu- 
sion. Ischaemic succinate accumulation arises from reversal of suc- 
cinate dehydrogenase, which in turn is driven by fumarate overflow 
from purine nucleotide breakdown and partial reversal of the malate/ 
aspartate shuttle. After reperfusion, the accumulated succinate is rap- 
idly re-oxidized by succinate dehydrogenase, driving extensive ROS 
generation by reverse electron transport at mitochondrial complex I. 
Decreasing ischaemic succinate accumulation by pharmacological 
inhibition is sufficient to ameliorate in vivo ischaemia-reperfusion 
injury in murine models of heart attack and stroke. Thus, we have 
identified a conserved metabolic response of tissues to ischaemia 
and reperfusion that unifies many hitherto unconnected aspects of 
ischaemia-reperfusion injury. Furthermore, these findings reveal a 
new pathway for metabolic control of ROS production in vivo, while 
demonstrating that inhibition of ischaemic succinate accumulation 
and its oxidation after subsequent reperfusion is a potential thera- 
peutic target to decrease ischaemia-reperfusion injury in a range of 
pathologies. 

Mitochondrial ROS production is a crucial early driver of ischaemia- 
reperfusion (IR) injury, but has been considered a nonspecific conse- 
quence of the interaction of a dysfunctional respiratory chain with oxygen 
during reperfusion’. Here we investigated an alternative hypothesis: 
that mitochondrial ROS during IR are generated by a specific meta- 
bolic process. To do this, we developed a comparative metabolomics 
approach to identify conserved metabolic signatures in tissues during 
IR that might indicate the source of mitochondrial ROS (Fig. 1a). Liquid 
chromatography-mass spectrometry (LC-MS)-based metabolomic anal- 
ysis of mouse kidney, liver and heart, and rat brain, subjected to ischaemia 
in vivo (Fig. 1a) revealed changes in several metabolites (Supplementary 


Table 1). However, comparative analysis (Supplementary Tables 2 and 
3) revealed that only three were increased across all tissues (Fig. 1b, c 
and Extended Data Fig. 1a). Two metabolites were well-characterized 
by-products of ischaemic purine nucleotide breakdown, xanthine and 
hypoxanthine’, corroborating the validity of our approach. Xanthine 
and hypoxanthine are metabolised by cytosolic xanthine oxidoreductase 
and do not contribute to mitochondrial metabolism’. The third meta- 
bolite, the mitochondrial citric acid cycle (CAC) intermediate succinate, 
increased 3-19-fold to concentrations of 61-729 ng mg‘ wet weight 
across the tested tissues (Fig. 1d, Supplementary Table 4 and Extended 
Data Fig. 1b, c), and was the sole mitochondrial feature ofischaemia that 
occurred universally in a range of metabolically diverse tissues. There- 
fore, we focused on the potential role of succinate in mitochondrial ROS 
production during IR. 

Because mitochondrial ROS production occurs early in reperfusion’ **”, 
it follows that metabolites fuelling ROS should be oxidized quickly. 
Notably, the succinate accumulated during ischaemia was restored to 
normoxic levels by 5 min reperfusion ex vivo in the heart (Fig. le), and 
this was also observed in vivo in the heart (Fig. 1f and Extended Data 
Fig. 2a), brain (Fig. 1g) and kidney (Fig. 1h). Of note, the accumulation 
of succinate by the in vivo heart was proportional to the duration of 
ischaemia (Extended Data Fig. 2a). These changes in succinate were local- 
ized to areas of the tissues where IR injury occurred in vivo, and took 
place without accumulation of other CAC metabolites (Fig. 1f-h). These 
data demonstrate that, uniquely, succinate accumulates markedly during 
ischaemia and is then rapidly metabolised on reperfusion at the same 
time as mitochondrial ROS production increases. 

To determine the mechanisms responsible for succinate accumula- 
tion during ischaemia and explore its role in IR injury we focused on the 
heart, because of the many experimental and theoretical resources avail- 
able. In mammalian tissues succinate is generated by the CAC, via oxi- 
dation of carbons from glucose, fatty acids, glutamate, and the GABA 
(y-aminobutyric acid) shunt'®”’ (Fig. 2a and Extended Data Fig. 2b). 
To assess the contribution of these carbon sources to the build-up of 
ischaemic succinate we performed an array of '*C-isotopologue labelling 
experiments in the ex vivo perfused heart followed by LC-MS analyses. 
Glucose is a major carbon source for the CAC, and therefore ischaemic 
CAC flux to succinate was first investigated by measuring its isotopo- 
logue distribution after infusion with [U- BC] glucose (in which U denotes 
uniformly labelled) (Fig. 2a). As expected, '*C-glucose was quickly oxi- 
dized via the CAC under normoxia, as indicated by the diagnostic (m + 2) 
and (m+ 4) isotopologues of the CAC intermediates (Fig. 2b and 
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Figure 1 | Comparative metabolomics identifies succinate as a potential 
mitochondrial metabolite that drives reperfusion ROS production. 

a, Comparative metabolomics strategy. b, Hive plot comparative analysis. 
All identified metabolites are included on the horizontal axis, while those 
accumulated (top axis) or depleted (bottom axis) in a particular ischaemic 
tissue are indicated by a connecting arc. Metabolites accumulated commonly 
across all tissues are highlighted. EV, ex vivo; IV, in vivo. c, Prevalence of 
accumulation of metabolites in murine tissues during ischaemia. d, Profile of 
mitochondrial CAC metabolite levels after ischaemia across five ischaemic 
tissue conditions (in vivo heart n = 5, succinate and fumarate n = 9; ex vivo 
heart n = 4, liver n = 4, brain n = 3, kidney n = 4). a-KG, o-ketoglutarate. 


Extended Data Fig. 3). However, the contribution of ‘*C-glucose to 
succinate was significantly reduced in ischaemic hearts (Fig. 2b and 
Extended Data Fig. 3). We then assessed the contribution of fatty acid 
oxidation to the CAC activity by perfusing hearts with [U-'*C] palmitate 
(Fig. 2a and Extended Data Fig. 4a). The CAC was readily enriched in 
3C-carbons derived from palmitate oxidation (Extended Data Fig. 4b). 
However, the contribution of '*C-palmitate to succinate was notably 
decreased during ischaemia (Fig. 2c and Extended Data Fig. 4b). Glu- 
tamine was not a major carbon source for CAC metabolites in normoxia 
or ischaemia (Extended Data Fig. 5a), and the minimal '*C-glutamine 
incorporation to a-ketoglutarate was decreased in ischaemia (Extended 
Data Fig. 5b). Finally, inhibition of the GABA shunt with vigabatrin”® 
(Fig. 2a) did not decrease ischaemic succinate accumulation (Fig. 2d and 
Extended Data Fig. 5c, d). Together, these data demonstrate that the major 
carbon sources for the CAC under normoxia do not significantly con- 
tribute to the build-up of succinate during ischaemia, indicating that 
succinate accumulation is not caused by conventional operation of car- 
diac metabolism. 

To explore other mechanisms that could lead to succinate accumu- 
lation during ischaemia, we considered earlier speculations that dur- 
ing anaerobic metabolism succinate dehydrogenase (SDH) might act 
in reverse to reduce fumarate to succinate’*-*. Although SDH reversal 
has not been demonstrated in ischaemic tissues, in silico flux analysis 
determined succinate production by SDH reversal during ischaemia as 
the best solution to sustain proton pumping and ATP production when 
metabolites including fumarate, aspartate and malate were available 
(Fig. 2e, Extended Data Fig. 6 and Supplementary Tables 5 and 6). The 
model predicted that fumarate supply to SDH came from two converg- 
ing pathways: the malate/aspartate shuttle (MAS), in which the high 
NADH/NAD* ratio during ischaemia drives malate formation that is 
converted to fumarate’*"*; and AMP-dependent activation of the purine 
nucleotide cycle (PNC) that drives fumarate production’”’* (Fig. 2e 
and Extended Data Fig. 6). To test this prediction experimentally, we 
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e, Time course of CAC metabolite levels during myocardial ischaemia and 
reperfusion in the ex vivo heart (n = 4). f, CAC metabolite levels during in vivo 
myocardial IR in at risk and peripheral heart tissue after ischaemia and 5 min 
reperfusion (n = 5; succinate and fumarate n = 9). g, CAC metabolite levels 
during in vivo brain IR after ischaemia and 5 min reperfusion (n = 3). h, CAC 
metabolite levels during in vivo kidney IR after ischaemia and 5 min reperfusion 
(n = 4; aconitate n = 3). **P< 0.01, ***P< 0.001. P values were calculated 
using two-tailed Student’s t-test for pairwise comparisons, and one-way 
analysis of variance (ANOVA) for multiple comparisons. Data are 

mean + s.e.m. of at least three biological replicates. 


infused mice with dimethyl malonate, a membrane-permeable precursor 
of the SDH competitive inhibitor malonate’’”° (Extended Data Fig. 7a-c). 
Dimethyl malonate infusion significantly decreased succinate accumu- 
lation in the ischaemic myocardium (Fig. 2f). This result indicates that 
SDH operates in reverse in the ischaemic heart, as inhibition of SDH 
operating in its conventional direction would have further increased suc- 
cinate (Fig. 2a, Extended Data Fig. 6 and Supplementary Tables 5 and 6). 
Therefore, succinate accumulates during ischaemia from fumarate reduc- 
tion by the reversal of SDH. 

Because aspartate is a common carbon source for fumarate in both 
the PNC and the MAS pathways (Fig. 2e), we used '*C-labelled aspar- 
tate to evaluate the contribution of these pathways to succinate produc- 
tion during ischaemia. '*C-aspartate infusion significantly increased the 
°C-succinate content of the ischaemic myocardium compared to nor- 
moxia (Fig. 2g). In fact, '3C-aspartate was the only 3C-carbon donor 
that exhibited substantial increased incorporation into succinate during 
ischaemia (Extended Data Fig. 7d). To characterize the relative con- 
tributions of the MAS and PNC to ischaemic succinate accumulation 
we used aminooxyacetate, which inhibits aspartate aminotransferase 
in the MAS” (Fig. 2e) and 5-amino-1-f-D-ribofuranosyl-imidazole- 
4-carboxamide (AICAR), which inhibits adenylosuccinate lyase in the 
PNC'*” (Fig. 2e). Both inhibitors decreased ischaemic succinate levels 
(Fig. 2h). Therefore, our results suggest that during ischaemia both the 
MAS and PNC pathways increase fumarate production, which is then 
converted to succinate by SDH reversal. 

To investigate the potential mechanisms underlying succinate-driven 
mitochondrial ROS production, we modelled in silico changes in isch- 
aemic cardiac metabolism after reperfusion. The simulations predicted 
that SDH oxidizes the accumulated succinate and, with complex III and 
IV at full capacity, drives reverse electron transport (RET) through mito- 
chondrial complex I (refs 23-26; Extended Data Fig. 8a—c). Notably, suc- 
cinate drives extensive superoxide formation from complex I by RET 
in vitro, making it a compelling potential source of mitochondrial ROS 
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Figure 2 | Reverse SDH activity drives ischaemic succinate accumulation by 
the reduction of fumarate. a, Potential inputs to succinate-directed flux by 
conventional cardiac metabolism and '*C-metabolite labelling strategy. 
bo? C-isotopologue profile of succinate in the normoxic and ischaemic 
myocardium after infusion of ae C-glucose (b) and a C-palmitate (c) (n = 4). 
ND, not detected. d, Effect of inhibition of GABA shunt with vigabatrin on 
GABA and succinate levels in the ischaemic myocardium (n = 4; ischaemia 
n= 5). e, Summary of in silico metabolic modelling of potential drivers of 
ischaemic succinate accumulation, and ° C-aspartate metabolic labelling 
strategy. AOA, aminooxyacetate; AS, adenylosuccinate; IMP, inosine 
5'-monophosphate; OAA, oxaloacetate; QH, dihydroubiquinone. f, Effect of 


during IR’**°. However, the role of complex I RET in IR injury has never 
been demonstrated. To test whether the succinate accumulated during 
ischaemia could drive complex I RET on reperfusion, we tracked mito- 
chondrial ROS with the fluorescent probe dihydroethidium (DHE), 
and mitochondrial membrane potential from the potential-sensitive fluo- 
rescence of tetramethylrhodamine methyl ester (TMRM), in a primary 
cardiomyocyte model of IR injury”. DHE was rapidly oxidized after reper- 
fusion, consistent with increased superoxide production” (Fig. 3a). 
Inhibition of SDH-mediated ischaemic succinate accumulation with 
dimethyl malonate reduced DHE oxidation on reperfusion (Fig. 3a). To 
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SDH inhibition by dimethyl malonate on CAC metabolite abundance in 

the ischaemic myocardium in vivo (n = 3). g, Relative incorporation of 
13C-aspartate to the indicated CAC metabolites in the normoxic and ischaemic 
myocardium (n = 4). h, Effect on CAC metabolite abundance in the ischaemic 
myocardium in vivo of blocking aspartate entry into the CAC through 
aminooxyacetate-mediated inhibition of aspartate aminotransferase, or 
blocking PNC by inhibition of adenylosuccinate lyase with AICAR (n = 3). 
*P <0.05, **P< 0.01, ***P <0.001 (two-tailed Student’s t-test for pairwise 
comparisons, one-way ANOVA for multiple comparisons). Data are 

mean + s.e.m. of at least three biological replicates. 


assess the role of succinate in driving ROS production further, we used 
a cell-permeable derivative of succinate, dimethyl succinate, which is 
readily taken up by cells, where it is then hydrolysed thereby increasing 
succinate levels (Extended Data Fig. 7b, c). Addition of dimethyl succi- 
nate to ischaemic primary cardiomyocytes significantly amplified reper- 
fusion DHE oxidation, suggesting that succinate levels controlled the 
extent of reperfusion ROS (Fig. 3b). Importantly, selective inhibition of 
complex I RET with rotenone (Fig. 3c and Extended Data Fig. 9a) or the 
mitochondria-targeted S-nitrosothiol MitoSNO®* (Fig. 3c) abolished 
both ischaemic succinate and dimethyl succinate-driven DHE oxida- 
tion after reperfusion, indicating that ischaemic succinate levels drove 
superoxide production through complex I RET. Succinate-dependent 


Figure 3 | Ischaemic succinate levels control ROS production in adult 
primary cardiomyocytes and in the heart in vivo. a,b, DHE oxidation during 
late ischaemia and early reperfusion, with/without inhibition of ischaemic 
succinate accumulation (no additions n = 6; dimethyl malonate n = 5) (a) or 
addition of dimethyl succinate during ischaemia (n = 6) (b). ¢, Inhibition of 
mitochondrial complex I RET reduces DHE oxidation on reperfusion after 
addition of dimethyl succinate (m = 5; dimethyl succinate n = 6). d, Effect 

of dimethyl malonate on mitochondrial re-polarization at reperfusion as 
determined by the rate of TMRM quenching (n = 3). e, Effect of dimethyl 
succinate and oligomycin on mitochondrial ROS in aerobic C2C12 myoblasts 
(n = 4). AU, arbitrary units. f, g, Effect of inhibition of ischaemic succinate 
accumulation by dimethyl malonate on mitochondrial ROS during IR injury 
in vivo assessed by MitoB oxidation (n = 5; dimethyl malonate n = 6) (f), 
and by aconitase inactivation (n = 4) (g). *P < 0.05, **P < 0.01 (two-tailed 
Student’s t-test for pairwise comparisons, one-way ANOVA for multiple 
comparisons). Data are mean + s.e.m. of at least three biological replicates. 
For cell data replicates represent separate experiments on independent 

cell preparations. 
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RET was further supported by the observation that NAD(P)H oxida- 
tion at reperfusion was suppressed by increasing succinate levels with 
dimethyl succinate (Extended Data Fig. 9b, c). Tracking the mitochon- 
drial membrane potential revealed that inhibition of ischaemic succinate 
accumulation with dimethyl malonate slowed the rate of mitochondrial 
repolarization after reperfusion (Fig. 3d and Extended Data Fig. 9d-f), 
consistent with accelerated repolarization, and RET at complex I, driven 
by succinate on reperfusion. Increasing succinate in C2C12 mouse myo- 
blast cells with dimethyl succinate while hyperpolarizing mitochondria 
with oligomycin increased oxidation of the mitochondrial ROS indicator 
MitoSOX independently of IR (Fig. 3e), suggesting that combining high 
succinate levels with a large protonmotive force is sufficient to drive 
complex I ROS production by RET. 

We next investigated whether succinate-driven complex I RET leads 
to ROS production in the heart in vivo, during IR injury. To do this we 
used the ratiometric mass spectrometric mitochondria-targeted ROS 
probe MitoB*. This probe is rapidly taken up by mitochondria in the 
heart in vivo and then oxidized to MitoP by hydrogen peroxide and 
peroxynitrite. Consequently measuring the MitoP/MitoB ratio by liquid 
chromatography-tandem mass spectrometry (LC-MS/MS) indicates 
changes in mitochondrial ROS in vivo’. At the onset of cardiac reper- 
fusion there was an increase in the MitoP/MitoB ratio, and this increase 
was prevented by blocking the accumulation of ischaemic succinate with 
dimethyl malonate (Fig. 3f). Furthermore, the activity of the mitochon- 
drial superoxide-sensitive CAC enzyme aconitase was decreased in the 
first few minutes of reperfusion, and this oxidative damage was also pre- 
vented by infusing dimethyl malonate during ischaemia to prevent suc- 
cinate accumulation (Fig. 3g). Together, these data indicate that succinate 
oxidation after reperfusion drives a burst of mitochondrial ROS pro- 
duction from complex I by RET during cardiac IR injury in vivo, and that 
this ROS production is prevented by dimethyl malonate. 

Our findings suggest the following model (Fig. 4a): during ischaemia, 
fumarate production increases, through activation of the MAS and PNC, 


oe Anup NADH 
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and is then reduced to succinate by SDH reversal. After reperfusion, the 
accumulated succinate is rapidly oxidized to maintain the Q pool reduced, 
thereby sustaining a large protonmotive force by conventional electron 
transport through complexes III and IV to oxygen, while also driving RET 
at complex I to produce the mitochondrial ROS that initiate IR injury”. 
This model provides a unifying framework for many hitherto uncon- 
nected aspects of IR injury, such as the requirement for time-dependent 
priming during ischaemia to induce ROS upon reperfusion, protection 
against IR injury by the inhibition of complexes I (ref. 8) and II (ref. 28), 
and by mild uncoupling”. 

Notably, our model also generates an unexpected, but testable, pre- 
diction. Manipulation of the pathways that increase succinate during 
ischaemia and oxidize it on reperfusion should determine the extent of 
IR injury. Because the reversible inhibition of SDH blocks both succi- 
nate accumulation during ischaemia (Fig. 2f) and its oxidation upon 
reperfusion, it should protect against IR injury in vivo. Intravenous infu- 
sion of dimethyl! malonate, a precursor of the SDH inhibitor malonate, 
during an in vivo model of cardiac IR injury was protective (Fig. 4b, c). 
Importantly, this cardioprotection was suppressed by adding back dime- 
thyl succinate (Fig. 4b, c and Extended Data Fig. 10a), which restored 
increased levels of ischaemic succinate (Fig. 4d), indicating that pro- 
tection by dimethyl malonate resulted solely from blunting succinate 
accumulation. Finally, intravenous infusion of dimethyl malonate dur- 
ing rat transient middle cerebral artery occlusion (t{MCAO), an in vivo 
model of brain IR injury during stroke, also suppressed ischaemic accu- 
mulation of succinate (Fig. 4e and Extended Data Fig. 10b) and was pro- 
tective, reducing the pyknotic nuclear morphology and vacuolation of 
the neuropil (Extended Data Fig. 10c), decreasing the volume of infarcted 
brain tissue caused by IR injury (Fig. 4f, g), and preventing the decline in 
neurological function and sensorimotor function associated with stroke 
(Fig. 4h and Extended Data Fig. 10d). These findings support our model 
of succinate-driven IR injury, demonstrating that succinate accumula- 
tion underlies IR injury in the heart and brain and suggests decreasing 
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Figure 4| NADH and AMP sensing pathways drive ischaemic succinate 
accumulation to control reperfusion pathologies in vivo through 
mitochondrial ROS production. a, Model of succinate accumulation 
during ischaemia and superoxide formation by RET during reperfusion. 

Ap, proton motive force. b, Representative cross-sections from mouse hearts 
after myocardial infarction + inhibition of ischaemic succinate accumulation 
and reintroduction of ischaemic succinate. Infarcted tissue is white, the rest of 
the area at risk is red, and non-risk tissue is dark blue. c, Quantification of 
myocardial infarct size as described in b (n = 6). d, Effect of intravenous 
infusion of dimethyl succinate in combination with SDH inhibition by 
dimethyl malonate on CAC metabolite abundance in the ischaemic 
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myocardium in vivo (n = 4). e, Effect of intravenous infusion of dimethyl 
malonate on succinate accumulation in the ischaemic brain in vivo (n = 4). 
f-h, Protection by dimethyl malonate against brain IR injury in vivo. 
Quantification of brain infarct volume (f) and rostro-caudal infarct distribution 
(g) + dimethyl malonate after brain IR injury by tMCAO in vivo (untreated 
n = 6; dimethyl malonate n = 4). h, Neurological scores for rats after 
tMCAO = dimethyl malonate (untreated n = 6; dimethyl malonate n = 4). 
*P < 0.05, **P < 0.01, ***P < 0.001 (two-tailed Student’s t-test for pairwise 
comparisons, and one-way ANOVA (c-e) or two-way ANOVA (f-h) for 
multiple comparisons). Data are mean + s.e.m. of at least three biological 
replicates, except for h, for which data are median + confidence interval. 
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succinate accumulation and oxidation as a new therapeutic approach 
for IR injury. 

We have demonstrated that the accumulation of succinate, via fuma- 
rate production and reversal of SDH, is a universal metabolic signature 
of ischaemia in vivo. In turn, succinate is a primary driver of the mito- 
chondrial ROS production on reperfusion that underlies IR injury in a 
range of tissues. Ischaemic accumulation of succinate may be of further 
relevance via its role in inflammatory and hypoxic signalling’®. Thus 
succinate could contribute to both the acute pathogenesis of IR injury 
by mitochondrial ROS, and then upon secretion also trigger inflamma- 
tion and neovascularisation”’. This further suggests that mitochondrial 
ROS produced by RET at complex I may normally act as a redox signal 
from mitochondria that responds to changes in electron supply to the 
Q pool and ATP demand, but is grossly over-activated in IR injury. 
Besides determining the metabolic responses that underlie IR injury, these 
data demonstrate that preventing succinate accumulation during isch- 
aemia is protective against IR injury in vivo, suggesting novel therapeu- 
tic targets for IR injury in pathologies such as heart attack and stroke. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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Transcript-RNA-templated DNA recombination 


and repair 


Havva Keskin!, Ying Shen'+, Fei Huang”, Mikir Patel”, Taehwan Yang", Katie Ashley’, Alexander V. Mazin? & Francesca Storici! 


Homologous recombination is a molecular process that has multiple 
important roles in DNA metabolism, both for DNA repair and genetic 
variation in all forms of life’. Generally, homologous recombination 
involves the exchange of genetic information between two identical 
or nearly identical DNA molecules’; however, homologous recom- 
bination can also occur between RNA molecules, as shown for RNA 
viruses’. Previous research showed that synthetic RNA oligonucleo- 
tides can act as templates for DNA double-strand break (DSB) repair 
in yeast and human cells**, and artificial long RNA templates injected 
in ciliate cells can guide genomic rearrangements’. Here we report 
that endogenous transcript RNA mediates homologous recombina- 
tion with chromosomal DNA in yeast Saccharomyces cerevisiae. We 
developed a system to detect the events of homologous recombina- 
tion initiated by transcript RNA following the repair of a chromo- 
somal DSB occurring either in a homologous but remote locus, or in 
the same transcript-generating locus in reverse-transcription-defective 
yeast strains. We found that RNA-DNA recombination is blocked 
by ribonucleases H1 and H2. In the presence of H-type ribonucleases, 
DSB repair proceeds through a complementary DNA intermediate, 
whereas in their absence, it proceeds directly through RNA. The prox- 
imity of the transcript to its chromosomal DNA partner in the same 
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locus facilitates Rad52-driven homologous recombination during 
DSB repair. We demonstrate that yeast and human Rad52 proteins 
efficiently catalyse annealing of RNA toa DSB-like DNA end in vitro. 
Our results reveal a novel mechanism of homologous recombination 
and DNA repair in which transcript RNA is used as a template for DSB 
repair. Thus, considering the abundance of RNA transcripts in cells, 
RNA may have a marked impact on genomic stability and plasticity. 

To investigate the capacity of transcript RNA to recombine with geno- 
mic DNA, we sought to discover whether a chromosomal DSB could 
be repaired directly by endogenous RNA in yeast S. cerevisiae cells. We 
designed a strategy by which we could induce a DSB in the HIS3 marker 
gene and monitor precise repair of the DSB by a homologous transcript 
messenger RNA by restoration of HIS3 function resulting in histidine 
prototrophic (His* ) cells (see Methods). We developed two experimental 
yeast cell systems, trans and cis, in strains YS-289, 290 and YS-291, 292, 
respectively (Extended Data Table 1). The trans system is designed to test 
the ability of a spliced (intron-less) antisense his3 transcript from chro- 
mosome III to repair a DSB ina different his3 allele on chromosome XV, 
which contains an engineered homothallic switching endonuclease cutting 
site (Fig. la and Extended Data Fig. 1a, b). The cis system is designed to test 
the capacity of the spliced antisense his3 transcript from chromosome III 


Figure 1 | Repair of a chromosomal 
DSB by transcript RNA. 

a, b, Scheme of the trans (a) and cis 
(b) cell systems used to detect DSB 
repair by transcript RNA. AI, 
artificial intron; HO, homothallic 
switching endonuclease; pGAL1, 
galactose-inducible promoter; RT, 
reverse transcriptase. Yellow 
triangles, cleavage activity by HO 
homothallic switching endonuclease; 
red question marks, hypothesis for 
transcript-RNA-templated DSB 
repair mechanism. c-e, Examples of 
replica-plating results (n = 6) from 
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to repair a homothallic-switching-endonuclease-induced DSB located 
inside the intron of the same his3 locus (Fig. 1b and Extended Data Fig. 1c). 
In both the trans and cis cell systems, the spliced antisense his3 transcript 
RNA can serve as a homologous template to repair the broken his3 DNA 
and restore its function. However, given the abundance of Ty retrotrans- 
posons in yeast cells, the spliced antisense his3 RNA could potentially 
be reverse transcribed by the Ty reverse transcriptase in the cytoplasm 
to cDNA that could then recombine with the homologous broken his3 
sequence or be captured by non-homologous end joining at the homo- 
thallic switching endonuclease break site to produce His” cells**. To 
distinguish DSB repair mediated by the transcript RNA template from 
repair mediated by the cDNA template, we performed the trans and cis 
assays in two yeast strains that contained either a wild-type SPT3 gene 
or its null allele, which prevents Ty transcription and strongly reduces Ty 
transposition and transpositional recombination**”. In both assays, cells 
containing wild-type SPT3 produced numerous His~ colonies after DSB 
induction (Fig. 1cand Table 1a). As expected, the frequency of His * col- 
onies in the trans system was significantly higher than that in the cis 
system because the his3 transcript is continuously generated in the pres- 
ence of galactose. In contrast, production of the full his3 transcript is 
immediately terminated upon DSB formation in the cis system. This fre- 
quency difference is not specific to the particular genomic loci in which 
the DSBs are induced, as transformation by DNA oligonucleotides (HIS3.F 
and HIS3.R) designed to repair the broken his3 gene produced the same 
frequency of His colonies in the two systems (Extended Data Tables 2a 
and 3), demonstrating that the homothallic switching endonuclease DSB 
stimulates homologous recombination in the trans and cis systems equally 
well. Notably, almost all the His* colonies are dependent on SPT3 func- 
tion, indicating that the DSB in his3 is repaired exclusively via the cDNA 
pathway (Fig. 1c and Table 1a). This finding demonstrates that if an 
actively transcribed gene is broken, it can be repaired using a cDNA tem- 
plate derived from its intact transcript. Moreover, these data also support 
the model in which reverse-transcribed products from any sort of RNA 
can be a significant source of genome modification at DSB sites’®. 
For RNA to recombine with DNA, an intermediate step that is prob- 
ably required is the formation ofan RNA-DNA heteroduplex. We there- 
fore deleted the genes coding for ribonuclease (RNase) H1 (RNH1) and/or 
the catalytic subunit of RNase H2 (RNH201), which both cleave the RNA 
strand of RNA-DNA hybrids’. Remarkably, while deletion of RNH1 
slightly increased the frequency of His* colonies in the trans system, 
deletion of RNH201 increased the frequency of His* colonies in both 
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the trans and cis systems, and combined deletion of RNH1 and RNH201 
resulted in an even stronger increase of His* colonies in both systems. 
Moreover, we detected His* colonies in rnh1 rnh201 cells in the absence 
of SPT3 (Fig. lcand Table 1a). Notably, there were more His" colonies 
in cis-system rnh1 rnh201 spt3 than in trans-system, and the frequency 
of His” colonies observed in the rnh1 rnh201 spt3 relative to spt3 cells 
was much higher in cis (>69,000) than in trans (>6,400) (Fig. 1c and 
Table 1a). If DSB repair in rnh1 rnh201 spt3 cells were due to cDNA, 
we would expect a higher His* frequency in the trans than in the cis 
system, as observed in wild-type cells. The fact that the His * frequency 
is higher in the cis system suggests that DSB repair is not mediated by 
cDNA but instead by RNA or predominantly RNA. To further examine 
the possibility that residual cDNA rather than transcript RNA is respon- 
sible for his3 correction in cis-system rnh1 rnh201 spt3 cells, we introduced 
a trans system directly into these cells and into the control cis wild-type 
cells. When wild-type cells of the cis system were transformed with a 
low-copy-number plasmid carrying the pGAL1-mhis3-AI cassette, where 
AI represents an artificial intron (BDG606; see Methods), they displayed 
a large (a factor of 4,000) increase in the His * frequency following DSB 
induction in his3 compared to the same cells transformed with the control 
empty vector (BDG283). In contrast, BDG606 in cis-system rnh1 rnh201 
spt3 cells did not significantly increase the His* frequency (Fig. ldand 
Extended Data Table 4). These results argue against the role of residual 
cDNA in template-dependent DSB repair in cis-system rnh rnh201 spt3 
cells and support a predominant, direct template function of the cis-system 
his3 transcript RNA in these cells. Overall, these data support the con- 
clusion that a transcript RNA can directly repair a DSB in cis-system 
rnh1 rnh201 and rnh1 rnh201 spt3 cells. The physical proximity of the 
his3 transcript to its own his3 DNA during transcription could facilitate 
annealing of the broken DNA ends to the transcript. This possibility is 
consistent with the fact that closer donor sequences repair DSBs more 
efficiently'*”* and that mature transcript RNAs are exported rapidly to 
the cytoplasm or degraded after completion of transcription”. 

To confirm that inactivation of RNases H1 and H2 allows for direct 
transcript RNA repair of a DSB in homologous DNA, we conducted a 
complementation test in the cis system using a vector expressing either 
a catalytically inactive mutant of RNH201, rnh201(D39A)”, or wild-type 
RNH201. Results showed that when wild-type RNH201 was expressed. 
from the plasmid in rnh1 rnh201 spt3 cells, there were no His* colonies 
following DSB induction (Extended Data Fig. 2a). Deletion of SPT3 isa 
well-established and robust method to suppress reverse transcription 


Table 1 | Frequencies of cDNA and transcript-RNA-templated DSB repair in trans and cis systems 


a trans cis 

Genotype His* freq. Survival His* freq. Survival 
Wild type 12,300 (10,000-14,600) 1.1% 2,100 (1,800-2,700) 0.7% 

spt3 <0.1 (0-8) 8%* <0.1 (0-0) 48% 

rmh201 33,000 (30,400-42,200) 0.7% 15,800 (11,800-18,300) 0.6% 

rmh201 spt3 <0.1 (0-5) 8% <0.1 (0-0) 7% 

rh 20,610 (17,100-23,900) 0.8% 1,780 (1,200-2,600) 0.5% 

rnh1 spt3 <0.1 (0-5) 9% <0.1 (0-10) 45% 

rmh1 rnh201 69,000 (58,600-76,500) 1% 75,000 (57,900-82,100) 0.5% 

rnh1 rnh201 spt3 642 (590-800) 11% 6,920 (5,840-7,900) 6% 

b cis cis 

Genotype His” freq. Survival Genotype His* freq. Survival 
Wild type 1,640 (1,200-1,850) 1% rnh1 rmh201 rad51 74,540 (55,130-87,530) 0.09% 
rad52 <0.1 (0-0) 0.2% rnh1 rnh201 spt3 7,560 (5,720-11,300) 75% 
rad51 5,700 (4,170-8,150) 04% rnh1 mh201 spt3 rad52 520 (300-1,100) 0.3% 
rnh1 rnh201 74,600 (64,900-84,000) 0.6% rnh1 rnh201 spt3 rad5 1 31,560 (12,910-39,220) 0.6% 
rmh1 rnh201 rad52 1,520 (970-2,580) 0.1% 


a, Result of RNase H defects on DSB repair by CDNA and transcript RNA. Frequencies of His* colonies per 10 viable cells for yeast strains of the trans and cis systems following 48 h galactose treatment are shown 
as median and 95% confidence interval (in brackets). Percentage of cell survival after incubation in galactose is also shown. There were 26 repeats for wild type, 12 for spt3, rnh201, rnh201 spt3, rnh1 and rnhl spt3, 
24 for rnh1 rnh201 in both trans and cis systems, 24 for trans-system rnh1 rnh201 spt3 and 18 for cis-system rnh1 rnh201 spt3. b, Result of recombination defects on DSB repair by cDNA and transcript RNA. 
Frequencies of His* colonies per 10’ viable cells for different rad52 and rad51 mutant strains of the cis system following 48 h galactose treatment are shown as median and 95% confidence interval (in brackets). 
There were 12 repeats for wild type, rnh1 rnh201 spt3, rnh1 rnh201 rad52 and rnh1 rnh201 spt3 rad52, and 6 for rad52, rnh1 rnh201, rad51, rnh1 rnh201 rad51 and rnh1 rnh201 spt3 rad51. Percentage of cell 
survival after incubation in galactose is also shown. For the significance of comparisons between the strains in the trans and the cis systems, and between different strains of the trans or the cis systems, that is 
between-group and within-group analysis, we used the two-tailed Mann-Whitney U-test (see Supplementary Table 1a, b). 

*Cells with the spt3-null allele have higher survival than wild-type SPT3 cells after DSB induction because they spend more time in G2 (see Extended Data Fig. 2c). 
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and formation of cDNA in yeast**”. However, to prove that the increased 
frequency of His* detected in the cis- relative to the trans-system rnh1 
rnh201 spt3 background was not solely linked to SPT3 deletion, we 
impaired cDNA formation by deleting the DBRI gene, which codes for 
the RNA debranching enzyme Dbr1 (refs 16, 17), or by using the reverse 
transcriptase inhibitor foscarnet (phosphonoformic acid)’*. Results shown 
in Fig. 1c and Extended Data Table 5a support our conclusion that RNA 
transcripts can directly repair a DSB in chromosomal DNA without being 
first reverse transcribed into cDNA in rnh1 rnh201 cells. 

Efficient generation of His* colonies in cis wild-type, rnh1 rnh201, or 
rnh1 rnh201 spt3 cells requires transcription and splicing of the anti- 
sense his3 and DSB formation in the his3 gene. Deletion of pGAL1 (the 
galactose-inducible promoter) upstream of his3 on chromosome III, 
deletion of the homothallic switching endonuclease gene, or growing 
cells in glucose medium, in which homothallic switching endonuclease 
is repressed, drastically decreased His* frequency (Extended Data Fig, 2b, c 
and Extended Data Table 5b, c). Similarly, yeast wild-type, rnh1 rnh201 
and rnh1 rnh201 spt3 cells of the cis system containing a 23-base-pair 
truncation of the artificial intron in his3 lacking the 5’ splice site (Extended 
Data Table 1 and Extended Data Fig. 1c) produced no His* colonies 
following DSB induction (Fig. le and Extended Data Table 5d), yet these 
cells were efficiently repaired by HIS3.F and HIS3.R synthetic oligonu- 
cleotides indicating that the DSB occurred in these cells (Extended Data 
Table 3). 

Next, to examine whether DSB repair frequencies at the his3 locus in 
the trans and cis systems correlate with the expression level of antisense 
his3 transcript, we performed quantitative real-time PCR (qPCR). The 
qPCR data showed that with increased time of incubation in galactose 
medium (from 0.25 to 8 h) the trans strains had significantly more his3 
RNA than the cis strains in all backgrounds, including the rnh1 rnh201 
spt3 strain. Furthermore, the levels of his3 transcript dropped signifi- 
cantly from 0.25 to 8 h in galactose in cis but not in trans strains, except 
for the cis strain in which the homothallic switching endonuclease gene 
was deleted (Extended Data Fig. 2d). These results are expected in the 
cis strains because as soon as the homothallic switching endonuclease 
DSB is made, a full his3 transcript cannot be generated. Therefore, these 
data corroborate the conclusion that the higher frequency of His* col- 
onies obtained in cis- than in trans-system rnh1 rnh201 spt3 cells (Fig. 1c 
and Table 1a) is not due to more abundant and/or more stable transcript 
but rather to the proximity of the transcript to the target DNA. 

PCR analysis of ten random His” colonies from each of the trans- and 
the cis-system rnh1 rnh201 spt3 backgrounds, and Southern blot analysis 
of three samples from each background showed that the his3 locus that 
was originally disrupted by the homothallic switching endonuclease site 
(trans background), or by the intron with the homothallic switching endo- 
nuclease site (cis background), was indeed corrected to an intact HIS3 
sequence. No integration of the HIS3 gene at the homothallic switching 
endonuclease site or elsewhere in the genome was detected in tested clones 
(20 of 20), excluding possible mechanisms of repair via capture of cDNA 
by end joining or via transposition (Fig. 2a and Extended Data Figs 3 and 
4a-c). Wealso excluded the possibility that double deletion of RNH1 and 
RNH201 resulted in increased level of Ty transposition. In fact, results 
presented in Extended Data Table 6 show transposition rates a factor 
of 3-14 lower in null rnh1 rnh201 than in wild-type cells. This could be 
due to an increase of non-productive Ty RNA-DNA substrates for the 
Ty integrase, resulting in abortive integrations and/or titration of the 
enzyme. Sequence analysis of 24 random His” colonies from the cis- 
system rnh1 rnh201 spt3 background revealed that all 24 clones had the 
same precise sequence as the spliced antisense his3 transcript and did 
not present a typical end joining pattern with small insertion, deletion 
or substitution mutations (Extended Data Fig. 1c and Extended Data 
Table 2b). These results, together with our observation of no His” colony 
formation in cells unable to splice the intron in his3 (Fig. le and Ex- 
tended Data Table 5d), strongly support a homologous recombination 
mechanism of DSB repair by transcript RNA in cis-system rnh1 rnh201 
spt3 cells. 
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Figure 2 | Transcript-templated DSB repair follows a homologous 


recombination mechanism. a, Southern blot analysis of yeast genomic 
DNA derived from trans wild-type His (lane 2) or His* (lane 3), 

rnh1 rnh201 spt3 His (lane 4) or His* (lanes 5-7) cells, digested with BamHI 
restriction enzyme and hybridized with the HIS3 probe, or derived from cis 
wild-type His (lane 8) or His* (lane 9), rnh1 rnh201 spt3 His (lane 10) or 
His* (lanes 11-13) cells, digested with Nar] restriction enzyme and hybridized 
with the HIS3 probe (Extended Data Fig. 4a, c). Lanes 1 and 14, 1-kilobase DNA 
ladder visible in the ethidium-bromide-stained gel (Extended Data Fig. 4b). 
Size of digested DNA bands is indicated by red arrows. bp, base pairs. 

b, Experimental scheme of Rad52-promoted annealing between RNA and 
DNA in vitro. Asterisk denotes **P label. ssDNA (named no. 211) or ssRNA 
(no. 501) oligonucleotides are in black; DNA oligonucleotides no. 508 and 
no. 509, forming double-stranded DNA (dsDNA), are in blue and green, 
respectively. Sequences of oligonucleotides no. 201, no. 501, no. 508 and no. 509 
are shown in Extended Data Table 2a. c, d, The kinetics of annealing promoted 
by yeast Rad52 (c) and human RAD52 (d). Nucleoprotein complexes were 
assembled between dsDNA (no. 508 and no. 509) with an ssDNA protruding 
tail (0.4nM) and either yeast or human Rad52 (1.35 nM) in the presence 
(dashed lines) or absence (solid lines) of yeast or human RPA (2 nM). 
Annealing was initiated by addition of *’P-labelled ssRNA or ssDNA (0.3 nM). 
The kinetics of protein-free annealing reactions are indicated by open squares 
and circles. The error bars represent the standard error of the mean, n = 4. 
For the significance of comparisons between the last two time points we used 
the two-tailed Mann-Whitney U-test. P values are given in Supplementary 
Table 1c. 


Previous studies showed the ability of Escherichia coli RecA to pro- 
mote pairing between duplex DNA and single-strand RNA in vitro. 
Recent work suggests that Rad51 (the homologous protein to bacterial 
RecA) can promote formation of RNA-DNA hybrids in yeast”. Here we 
show that transcript-RNA-directed chromosomal DNA repair is stimu- 
lated by the function of Rad52 but not Rad51 recombination protein”. 
Rad52 is important for homologous recombination both via single-strand 
annealing and via strand invasion’. DSB repair by transcript RNA was 
reduced over 14-fold in cis-system rmh1 mh201 spt3 rad52 but was increased 
by a factor of 4 in cis-system rnh1 rnh201 spt3 rad51 compared to rnh1 
rnh201 spt3 cells (Table 1b). Notably, our in vitro experiments demon- 
strate that both yeast and human Rad52 efficiently promote annealing 
of RNA toa DSB-like DNA end (Fig. 2b-d and Extended Data Fig. 4d-h). 
Importantly, Rad52 catalyses the reaction with RNA at nearly the same rate 
as the reaction with single-stranded DNA (ssDNA) of the same sequence. 
Moreover, in our experiments replication protein A (RPA), a ubiqui- 
tous ssDNA binding protein’, caused a moderate inhibition of Rad52- 
promoted annealing between complementary ssDNA molecules, but 
not between ssRNA and ssDNA molecules. Thus, in the presence of RPA, 
the annealing between ssRNA and ssDNA proceeded with higher effi- 
ciency than the reaction between ssDNA molecules (Fig. 2b-d and Ex- 
tended Data Fig. 4d-g). 

In vivo, CDNA and/or RNA-dependent DSB repair may be especially 
important in the absence of functional Rad51 that prevents repair by the 
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Figure 3 | Models of transcript-RNA-templated DSB repair in cis. An 
actively transcribed DNA region experiencing a DSB uses its own transcript 
RNA as a bridging (a) or an extension (b) template for repair. The small black 
lines indicate initial annealing between the transcript RNA and the DSB end(s), 
and between the two DSB ends. Orange circles, Rad52; green triangles, 
RNase H1 and H2 (H1/2). 


uncut sister chromatid via strand invasion”. Indeed, our results show 
that deletion of RADS51 increases the frequency of repair by cDNA and/ 
or RNA (Table 1b). Hence, considering the bias observed for DSB repair 
in cis versus trans systems when Ty reverse transcription was impaired, 
we propose a model that in the absence of H-type RNase function, tran- 
script RNA mediates DSB repair preferentially in cis systems via a Rad52- 
facilitated annealing mechanism. In this mechanism, the transcript may 
provide a template that either bridges broken DNA ends to facilitate precise 
re-ligation or initiate single-strand annealing via a reverse-transcriptase- 
dependent extension of the broken DNA ends (Fig. 3). The reverse tran- 
scriptase activity could be provided by a replicative DNA polymerase’, 
minimal Ty reverse transcriptase, or both. The current view in the field 
is that RNA-DNA hybrids formed by the annealing of transcript RNA 
with complementary chromosomal DNA either in cis or in trans systems 
are mainly a cause of DNA breaks, DNA damage and genome instability™*. 
Here we demonstrate that under genotoxic stress, transcript RNA is 
recombinogenic and can efficiently and precisely template DNA repair 
in the absence of H-type RNase function in yeast. In the central dogma 
of molecular biology, the transfer of genetic information from RNA to 
DNA is considered to be a special condition, which has been restricted 
to retro-elements” and telomeres”*. Our data show that the transfer of 
genetic information from RNA to DNA occurs with an endogenous gene- 
ric transcript (his3 antisense), and is thus a more general phenomenon 
than previously anticipated. In addition, in vitro RNA-DNA annealing 
was markedly promoted not only by yeast but also human RAD52, sug- 
gesting that transcript-RNA-templated DNA repair could occur in human 
cells. RNA transcripts could template DNA damage repair at highly tran- 
scribed loci, in cells that do not divide (lack sister chromatids), or have more 
stable RNA-DNA heteroduplexes, like those defective in RNASEH2 in 
patients with Aicardi-Goutieres syndrome”. Our findings lay the ground- 
work for future exploration of RNA-driven DNA recombination and 
repair in different cell types. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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Thirty years ago it was shown that the non-enzymatic, template- 
directed polymerization of activated mononucleotides proceeds read- 
ily in a homochiral system, but is severely inhibited by the presence 
of the opposing enantiomer’. This finding poses a severe challenge 
for the spontaneous emergence of RNA-based life, and has led to the 
suggestion that either RNA was preceded by some other genetic poly- 
mer that is not subject to chiral inhibition’ or chiral symmetry was 
broken through chemical processes before the origin of RNA-based 
life**. Once an RNA enzyme arose that could catalyse the polymer- 
ization of RNA, it would have been possible to distinguish among the 
two enantiomers, enabling RNA replication and RNA-based evolu- 
tion to occur. It is commonly thought that the earliest RNA polymer- 
ase and its substrates would have been of the same handedness, but 
this is not necessarily the case. Replicating D- and L-RNA molecules 
may have emerged together, based on the ability of structured RNAs 
of one handedness to catalyse the templated polymerization of acti- 
vated mononucleotides of the opposite handedness. Here we develop 
such a cross-chiral RNA polymerase, using in vitro evolution starting 
from a population of random-sequence RNAs. The D-RNA enzyme, 
consisting of 83 nucleotides, catalyses the joining of L-mono- or oli- 
gonucleotide substrates on a complementary L-RNA template, and 
similar behaviour occurs for the L-enzyme with D-substrates and a 
p-template. Chiral inhibition is avoided because the 10°-fold rate 
acceleration of the enzyme only pertains to cross-chiral substrates. 
The enzyme’s activity is sufficient to generate full-length copies 
of its enantiomer through the templated joining of 11 component 
oligonucleotides. 

A potential advantage of a cross-chiral polymerase is that it offers a 
new mode of recognition between enzyme and substrates that avoids 
Watson-Crick pairing and therefore may provide greater sequence gen- 
erality. Opposing enantiomers of RNA are unable to form contiguous 
base pairs”° and must instead recognize each other through tertiary 
interactions’. Similar to the way a protein polymerase recognizes nuc- 
leic acids, a cross-chiral RNA polymerase might recognize the shape 
of the RNA duplex while being largely indifferent to the identity of the 
bases. Considerable progress has been made in developing D-RNA 
enzymes that polymerize D-RNA substrates*”, but these enzymes have 
strong sequence preferences” that currently preclude the RNA-catalysed 
replication of RNA, a defining function of RNA-based life. 

The search for a cross-chiral RNA polymerase began with a popula- 
tion of 10'° random-sequence D-RNAs that were tethered via a flexible 
linker to the template strand of a template-primer complex composed 
of L-RNA (Fig. la). A separate 5’-triphosphorylated, 3’-biotinylated 
L-oligonucleotide substrate was provided that could bind to the template 
adjacent to the primer. D-RNA molecules that catalysed ligation of the 
substrate and primer were captured using streptavidin and selectively 
amplified. After ten rounds of this procedure, a catalytic motif was iden- 
tified and trimmed of extraneous nucleotides (Extended Data Figs la 
and 2a). This motif consists of a central core supported by three stem 
regions. 

Next, four unpaired nucleotides within the central core were replaced 
by 30 random-sequence nucleotides (Extended Data Fig. 2b) and six 
additional rounds of selective amplification were carried out. For these 
additional rounds, the population of D-RNAs was tethered to the primer 


and both the template and substrate were provided as separate mole- 
cules (Fig. 1b). This was done to encourage the development of catal- 
ysts that would be more general with regard to the reaction format. An 
optimized D-enzyme was identified from the final evolved population 
(Extended Data Fig. 1b), and again trimmed of extraneous nucleotides 
(Extended Data Fig. 2c-e), resulting in an 83-nucleotide motif that ca- 
talyses the ligation of L-RNA oligonucleotides on an L-RNA template 
(Fig. 1c). The rate of this reaction is 0.45 min ~ ' (Extended Data Fig. 3a), 
which is approximately 10°-fold faster than the uncatalysed rate of 
reaction’. 

The RNA enzyme can operate ona separate template-substrate com- 
plex, recognizing that complex through tertiary interactions. The D- 
enzyme catalyses the ligation of two L-RNA substrates on an L-RNA 
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Figure 1 | Evolution of a cross-chiral RNA ligase. a, Reaction format during 
the first ten rounds of selective amplification, with the D-enzyme tethered to the 
L-template-primer complex, and with the 5’-triphosphorylated (ppp), 3’- 
biotinylated (B) L-substrate provided separately. L-Nucleotides are shown in 
blue. The starting population contained 70 random-sequence nucleotides 
(N7o), flanked by fixed primer-binding sites (open rectangles). Curved arrow 
indicates the site of ligation. b, Reaction format during rounds 11-16 of 
selective amplification, with the D-enzyme tethered to the L-primer, and with 
both the L-template and L-substrate provided separately. An additional 30 
random-sequence nucleotides (N30, green) were inserted before round 11. nt, 
nucleotides. c, Sequence and secondary structure of the final evolved enzyme. 
Nucleotides that derived from the N39 insert are shown in green. 
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template, and the mirror-image L-enzyme behaves similarly with D-RNA 
substrates and a D-RNA template (Fig. 2a). Furthermore, the two en- 
zymes can operate in a common mixture that contains both the L- and 
the D-versions of the substrates and template. The D- and L-enzymes can- 
not interact through Watson-Crick pairing and do not appear to interact 
significantly through cross-chiral contacts. The intermolecular reaction 
exhibits saturation kinetics, with a catalytic rate (ka) of 0.019 min! 
anda Michaelis constant (K,,) of 3.3 WM (Extended Data Fig. 4). There 
is no detectable reaction when the template-substrate complex is of the 
same handedness as the enzyme, even at 50 [1M concentration. 

The products of the ligation of two D-RNA substrates were gel puri- 
fied, then subjected to cleavage by RNase A, which cleaves 3',5’- but not 
2',5'-phosphodiester linkages. Cleavage at the ligation junction was com- 
plete, demonstrating that the enzyme forms the ‘natural’ 3’,5'-linkage 
(Extended Data Fig. 5). 

Although the enzyme was selected on the basis of templated ligation 
activity, this reaction is mechanistically similar to the templated poly- 
merization of nucleoside 5’ -triphosphates (NTPs). Other selected ligases 
have shown at least some polymerization activity'*"’, which is the case 
here too. The four L-NTPs were prepared by chemical synthesis and 
tested in various primer extension reactions with the D-RNA enzyme 
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Figure 2 | Cross-chiral ligation and polymerization. a, Template-directed 
ligation of two oligonucleotides catalysed by an RNA enzyme of the opposite 
handedness. The sequences of the substrates and template are as shown in 
Fig. 1b, but with the enzyme detached from the primer. The reactions used 
10 uM enzyme, 0.5 LM fluorescently labelled upstream substrate, 4 41M 
downstream substrate, 2 uM template, 250 mM MgCl, and 250 mM NaCl, 
which were incubated at pH 8.5 and 23 °C for 0.5, 2 or 8 h. The marker lane (M) 
contains the D- and L-upstream substrates alone, labelled with either fluorescein 
(green) or boron-dipyrromethene (red), respectively. b, Template-directed 
polymerization of L-NTPs catalysed by a D-RNA enzyme. The L-primer was 
tethered to the D-enzyme as shown in Fig. 1b and the L-template was provided 
separately. All templates had the primer-binding sequence shown in Fig. 1b, 
followed by 3’-CCCCAGUA-S’ for GTP addition, 3’-UUUUAGUA-5’ for 
adenosine triphosphate (ATP) addition, 3’-GGGGAGUA-5’ for cytidine 
triphosphate (CTP) addition, or 3’-AAAAAGUA-S’ for uridine triphosphate 
(UTP) addition. The reactions used 0.5 1M enzyme-primer complex, 1 1M 
template, and 4mM of the appropriate NTP, under the same conditions 

as described earlier, except at 17 °C for 24h. The reaction products were 
photocleaved to detach the extended primer before analysis by polyacrylamide 
gel electrophoresis (PAGE). 
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and a separate L-RNA template. By providing a template with the se- 
quence 3’-CCCCAGUA-5' immediately downstream from the primer- 
binding site, and supplying 4 mM L-guanosine triphosphate (L-GTP), 
the D-RNA enzyme catalyses four successive GTP additions (Fig. 2b). 
When instead provided with D-GTP there is only a very low level of 
single-nucleotide addition. When provided with a racemic mixture of 
D,L-GTP the results are nearly identical to the reaction with L-GTP alone, 
with an observed rate of 0.11 min‘ in both cases (Extended Data Fig. 3b). 
Thus, there is no chiral inhibition in the RNA-catalysed polymerization 
reaction, unlike the situation with the non-enzymatic template-directed 
polymerization of activated mononucleotides'. 

Other template-primer combinations were used to demonstrate the 
ability of the D-RNA enzyme to add each of the four L-NTPs on a com- 
plementary template (Fig. 2b). These experiments revealed that the en- 
zyme does have sequence preferences, with addition to a 3’-terminal C 
or G residue being most efficient and addition to a 3’-terminal A or U 
residue being poor. Addition of GTP to a 3’-terminal C is especially 
efficient and mimics the ligation junction that was used during in vitro 
evolution. No attempt has yet been made to select directly for NTP addi- 
tion or with different sequences surrounding the reaction site. None- 
theless, the current sequence tolerance of the enzyme is sufficient to 
enable the assembly of a variety of enantiomeric RNA products. 

The RNA enzyme appears to be indifferent to the length of the sub- 
strates, so long as they are bound to a complementary template. As a 
demonstration of this property, a mixture of D-mono- and oligonucle- 
otides were assembled on two different long D-RNA templates (Fig. 3a, b). 
The first required seven ligations and three NTP additions; the second 
required seven ligations and two NTP additions; and both resulted in 
the synthesis of full-length products. The ladder of 5’-labelled materi- 
als demonstrates that some additions are more efficient than others, 
probably reflecting a mixture of sequence preference, structural con- 
text and competition among substrates. However, there is a clear pro- 
gression of successive additions, culminating in the full-length product. 
The accurate assembly of the full-length materials was confirmed by 
sequence analysis (Extended Data Fig. 6). 

Asa final test of the ability of the enzyme to synthesize enantiomeric 
products, the D-RNA enzyme was used to assemble 11 L-oligonucleotides 
to form a mirror copy of itself. The ten ligation junctions had either a C 
or G residue at the 3’ terminus and an A, U or G residue at the 5’ ter- 
minus (Fig. 1b). The ladder of 5’-labelled materials again demonstrates 
successive additions culminating in the full-length product (Fig. 3c). This 
full-length material was gel purified and tested for enzymatic activity 
ina ligation reaction with two D-RNA substrates and a D-RNA template, 
confirming that it is fully functional (Fig. 3d). This is, to our knowledge, 
the first demonstration of an enzyme being synthesized by its enantiomer. 

Biology is overwhelmingly homochiral, with only sparse examples of 
L-sugars and D-amino acids, such as L-arabinose in plant hemicellulose 
and p-alanine in bacterial peptidoglycan. There is no known example 
ofa biopolymer containing subunits entirely of the ‘wrong’ handedness. 
This is because the stereochemical handshake between biopolymers 
would seem to demand chiral uniformity. Yet macromolecules of oppo- 
site handedness can interact in their own fashion, including to bring 
about chemical transformations. The advantages of a cross-chiral poly- 
merase for RNA-based life are twofold: first, both enantiomers are used, 
so polymerization does not deplete the supply of the ‘correct’ enantio- 
mer; and second, the interaction between D- and L-RNA does not allow 
consecutive Watson-Crick pairs that can contribute to sequence bias. 

The question remains as to how a chirally pure RNA enzyme would 
arise in the first place, and moreover how there might be both D- and L- 
versions of such an enzyme. One possibility is that RNA-based life was 
preceded by a genetic system based on an achiral polymer*"*, which 
then evolved the ability to synthesize RNA polymers. An achiral cata- 
lyst would generate both D- and L-RNA, but could distinguish between 
the homo- and heterochiral addition of monomers to the growing chain. 
A second possibility is that life began with the non-enzymatic replica- 
tion of either D- or L-RNA’*"®, and subsequently evolved the ability to 
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Figure 3 | Cross-chiral assembly of long RNAs. a, Assembly of 50-nucleotide 
and 49-nucleotide D-RNAs on complementary D-RNA templates through 
multiple ligation and polymerization events, catalysed by the L-RNA enzyme. 
The reaction mixtures were sampled at 0, 1, 2 and 3 days and the 5’-labelled 
products were analysed by PAGE in comparison with authentic full-length 
material (M). Numbers on the right indicate the nucleotide length of 
successively assembled components. Dots indicate intermediate-length 
materials resulting from degradation of longer products. See Methods for 
reaction conditions. b, Sequences of substrates and templates used to assemble 
the two RNAs shown in a. Dots indicate the junctions for assembly. c, Assembly 
of the 83-nucleotide L-RNA enzyme on a complementary L-RNA template, 
catalysed by the D-RNA enzyme of the same sequence. The reaction mixture 
was sampled at 0, 1, 3 and 5 days and the products were analysed as above. 
Red dots in Fig. 1c indicate the junctions for assembly, with sequence 
modifications at positions 13, 14, 31 and 32, as shown in Extended Data Fig. 2g. 
d, Catalytic activity of the L-RNA enzyme that had been assembled by the 
D-RNA enzyme. The reaction conditions are as in Fig. 2a, but with 0.5 uM 
enzyme, 0.2 1M upstream substrate, 1 11M downstream substrate, and 0.5 uM 
template. Fyeact fraction reacted. 


catalyse the cross-chiral polymerization of RNA. The products of cross- 
chiral polymerization could do so similarly, ultimately displacing the 
chemical replication process. 

The cross-chiral polymerase is still a young enzyme, only 16 rounds 
of selective amplification away from random sequence. However, it has 
auspicious properties that can probably be improved through further 
in vitro evolution. It will be especially important to increase the cata- 
lytic rate of the enzyme and to enhance its ability to extend 3’ termini 
that end in either an A or U residue. The ultimate aim is to achieve cross- 
chiral RNA replication, which would require the enzyme to generate 


442 | NATURE | VOL 515 | 20 NOVEMBER 2014 


both strands of an RNA duplex, that is, both the enantiomeric enzyme 
and its complement. Cross-chiral replication does not require the D- 
and L-enzymes to have the same sequence, and even if initiated with 
enzymes of the same sequence, the two would probably soon drift apart. 
If early life did entail the cross-chiral polymerization of RNA, then 
there would have been an era when both sides of the mirror were in- 
dispensable. Subsequently, however, a key evolutionary innovation may 
have arisen on one side of the mirror, for example, the invention of 
instructed L-polypeptide synthesis by D-RNA. Then the other side of 
the mirror could go dark, leaving biology to follow a homochiral path. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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The Ras-like GTPases RalA and RalB are important drivers of tumour 
growth and metastasis’. Chemicals that block Ral function would be 
valuable as research tools and for cancer therapeutics. Here we used 
protein structure analysis and virtual screening to identify drug-like 
molecules that bind to a site on the GDP-bound form of Ral. The com- 
pounds RBC6, RBC8 and RBC10 inhibited the binding of Ral to its 
effector RALBP1, as well as inhibiting Ral-mediated cell spreading 
of murine embryonic fibroblasts and anchorage-independent growth 
of human cancer cell lines. The binding of the RBC8 derivative BQU57 
to RalB was confirmed by isothermal titration calorimetry, surface 
plasmon resonance and 'H-'°N transverse relaxation-optimized spec- 
troscopy (TROSY) NMR spectroscopy. RBC8 and BQU57 show selec- 
tivity for Ral relative to the GTPases Ras and RhoA and inhibit tumour 
xenograft growth to a similar extent to the depletion of Ral using RNA 
interference. Our results show the utility of structure-based discovery 
for the development of therapeutics for Ral-dependent cancers. 

More than one-third of human tumours harbour activating RAS 
mutations’, which has motivated extensive efforts to develop inhibitors 
of Ras for cancer therapy. However, therapies directed at interfering with 
post-translational modifications of Ras* had poor clinical performance; 
therefore, efforts shifted to targeting the signalling components down- 
stream of Ras such as the Raf-MEK-ERK mitogen-activated protein 
kinase pathway* and the phosphatidylinositol-3-OH kinase-AKT-mTOR 
pathway”. A third pathway downstream of Ras leads to the activation of 
the Ras-like small GTPases RalA and RalB*, and this pathway has not been 
targeted to date. Active Ral activates cellular processes through effec- 
tors, including Ral-binding protein 1 (RALBP1; also known as RLIP76 
and RIP1)’, the human exocyst subunits SEC5 and EXO84, filamin and 
phospholipase D1 (refs 8-10). These effectors mediate regulation of cell 
adhesion (anchorage independence), membrane trafficking (exocyto- 
sis and endocytosis), mitochondrial fission, and transcription. RalA and 
RalB are important drivers of the proliferation, survival and metastasis 
of multiple human cancers, including skin'’, lung’’, pancreatic’, colon”’, 
prostate’*, and bladder’®’® cancers. 

We set out to discover small molecules that inhibit the intracellular 
actions of the Ral-family GTPases. Our approach was based on the hypoth- 
esis that molecules that selectively bind to Ral-GDP might restrict Ral to 
an inactive state in the cell, making it unavailable to promote processes 
linked to tumorigenesis. Comparing the available three-dimensional 
structures of RalA revealed differences in a region adjacent to, but dis- 
tinct from, the guanine nucleotide binding pocket (Fig. 1). This site is 
formed by the switch-II region (amino acids 70-77), the «2 helix (amino 
acids 78-85) and one face of the «3 helix (Fig. 1a). Its proximity to the 
previously described C3bot binding site’” supports the notion that small 
molecule occupancy at this site could inhibit function. The crystal structures 
used in the comparison included RalA-GDP (Protein Data Bank (PDB) 


ID, 2BOV; Fig. 1a, b) and RalA-GNP (RalA bound toa non-hydrolysable 
form of GTP, the GTP analogue GMP-PNP) in complex with EXO84 
(PDB ID, 1ZC4; Fig. 1c) or SEC5 (PDB ID, 1UAD, Fig. 1d). The volumes 
calculated for this binding site were 175 A® for RalA-GDP (Fig. 1b), 
155 A for RalA~GNP-EXO84 (Fig. 1c) and 116 A® for RalA~GNP-SEC5 
(Fig. 1d). To the best of our knowledge, a RalB-GDP crystal structure 
is not available. However, in the RalB—GNP structure (PDB ID, 2KE5; 
Extended Data Fig. 1), this binding site is largely absent. Next, we used 
a structure-based virtual screening approach’* to identify small mole- 
cules that bind to this site in RalA-GDP by individually docking 500,000 
compounds to this site (using ChemDiv, v2006.5)”” and by scoring protein- 
ligand complexes based on calculated interaction energies. This process 
led to the selection of 88 compounds. 

We developed an enzyme-linked immunosorbent assay (ELISA) for 
assaying Ral activity in living cells based on the selective binding of active 
RalA-GTP to its effector protein RALBP1. This assay used J82 human 
bladder cancer cells that stably expressed Flag-tagged RalA. The Flag epi- 
tope tag greatly increased the sensitivity and dynamic range of the assay 
compared with using Ral-specific antibodies for detection (Extended 
Data Fig. 2a). Cells were treated with each of the 88 compounds (tested 
at 50 uM), and then extracts were prepared. The binding of Flag—RalA 
to recombinant RALBP1 that had been immobilized in 96-well plates 
was quantified. In this assay, RalA binding reflects Ral’s GTP load- 
ing and capacity for effector activation. The compounds RBC6, RBC8 
and RBC10 (structures shown in Fig. le-g) reduced the activation of 
RalA in living cells (Fig. 1h), while compounds RBC5, RBC7 and RBC42 
(structures not shown) had no effect and thus served as negative con- 
trols. None of the 88 compounds inhibited GTP or GDP binding to 
purified recombinant RalA (Supplementary Table 1), which is consis- 
tent with the interaction site being distinct from that used for binding 
guanine nucleotides. 

Another cell-based assay was also used to assess the effects of these 
88 compounds. Ral is required for lipid raft exocytosis and cell spread- 
ing on fibronectin-coated coverslips by murine embryonic fibroblasts 
(MEFs)”°. The depletion of RalA with a specific short interfering RNA 
(siRNA) inhibited the spreading of wild-type MEFs, whereas caveolin- 
deficient (Cav1~/~) MEFs retained the capacity to spread after RalA 
depletion. When the effects of RBC6, RBC8 and RBC10 on cell spread- 
ing in wild-type and Cav1~'~ MEFs were tested, only the wild-type 
MEFs were inhibited (Fig. li and Extended Data Fig. 2b). RBC6 and 
RBC8 (but not RBC10) are related structures with the same bicyclic 
core (Fig. leg); specific substitutions gave rise to similar but somewhat 
different binding orientations in the allosteric binding cavity (Extended 
Data Fig. 2c-e). We therefore focused on RBC6 and RBC8 in further 
experiments. 
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Figure 1 | Structure-based in silico library screening and cell-based 


secondary screening identified RBC6, RBC8 and RBC10 as lead compounds 
for Ral inhibition. a, b, Structural model of RalA-GDP as a ribbon (a) or 
surface (b) representation. GDP is shown in yellow, Mg”* is shown as a green 
sphere, o-helices are shown in red, and B-sheets are shown in cyan. The red 
sphere and surfaces indicate the water accessible area in the binding cavity. 
All models were generated with Accelrys Discovery Studio software using 
published structures. c, d, Surface representations of RalA-GNP in complex 
with EXO84 (EXO84 not shown) (c) and RalA-GNP in complex with SEC5 
(SEC5 not shown) (d). e-g, Chemical structure of RBC6 (e), RBC8 (f) and 
RBC10 (g). h, RalA ELISA results for the top compounds (RBC6, RBC8 and 
RBC10) and for three ineffective compounds (RBC5, RBC7 and RBC42), as 
identified by computational screening. J82 cells overexpressing Flag—RalA were 
treated with each compound for 1h and then subjected to a RalA ELISA, as 
described in Methods. Data are presented as the mean + s.d. of three technical 
replicates and expressed as the percentage of DMSO control. i, Dose response 
effect of RBC6, RBC8 and RBC10 on the RalA-dependent spreading of wild- 
type MEFs. MEFs were treated with 0-15 1M each compound for 1 h and 
subjected to the MEF-spreading assay, as described in Methods. Data are 
presented as the mean + s.d. of three technical replicates. 


To test for the direct binding of compounds to Ral, we used 'H-'°N 
TROSY NMR spectroscopy. The NMR structure of RalB in complex with 
GNP has been solved (PDB ID, 2KE5; Biological Magnetic Resonance 
Bank (BMRB) ID, 15230)’; therefore, we focused on this isoform. First, 
we obtained complete backbone NMR chemical shift assignments for 
the RalB-GDP complex (see Methods), and then we compared the 


444 | NATURE | VOL 515 | 20 NOVEMBER 2014 


"H-'°N-TROSY NMR spectrum of RalB-GDP and RalB-GNP to deter- 
mine the chemical shift differences between the GIP-bound and GDP- 
bound states. Almost all of the differences were confined to residues that 
interact with the third phosphate of the GTP (Extended Data Fig. 3a, b). 
"H-'°N-TROSY spectra were then recorded in the presence of the com- 
pound RBC8 or dimethylsulphoxide (DMSO) as a control, and the chem- 
ical shift changes were compared. RBC8 induced chemical shift changes 
in RalB-GDP but not in RalB-GNP, indicating that RBC8 shows selec- 
tivity for the GDP-bound form of Ral (Extended Data Fig. 3c, d). More- 
over, RBCS, which did not affect the level of active Ral in the cell-based 
ELISA assay, did not induce chemical shift changes in RalB-GDP (Ex- 
tended Data Fig. 3e), thereby serving as an additional negative control. 

On the basis of all of these data, including the structural features, a 
series of RBC8 derivatives was synthesized and tested for binding in vitro. 
We chose BQU57 for further evaluation because of its superior perfor- 
mance to RBC8 and its drug-like properties (Fig. 2a, Extended Data Fig. 4a 
and synthesis pathway in Supplementary Methods). A detailed NMR 
analysis of the binding between BQU57 and RalB-GDP was carried out. 
The NMR spectrum of RalB-GDP (100 1M) in the absence and pres- 
ence of BQU57 (100 1M) is shown in Fig. 2b. Concentration-dependent 
chemical shift changes for representative residues are shown in Fig. 2c. 
A plot of the chemical shift changes with BQU57 (100 UM) asa function 
of sequence (Fig. 2d) shows that residues that exhibit marked changes 
are located in the switch-II (amino acids 70-77) and «2 helix (amino 
acids 78-85) regions. Because no RalB-GDP crystal structure is avail- 
able, a homology model was generated based on similarity to RalA—GDP, 
and the residues that displayed chemical shift changes in response to 
the compounds were mapped onto this model (Fig. 2e). The majority 
of the chemical shift changes were localized to the allosteric site, con- 
sistent with assignment of BQU57 binding to this site based on model- 
ling. Similar to the results for RBC8, BQU57 (100 11M) did not bind to 
RalB-GNP (100 UM) as indicated by the minimal chemical shift changes 
in the NMR spectrum (Extended Data Fig. 4b). Analysis of the NMR 
chemical shift titrations revealed that the binding of BQU57 was stoi- 
chiometric up to the apparent limiting solubility of the drug (which was 
estimated as ~ 100 [1M in control experiments without protein) (Extended 
Data Fig. 4c). The binding of BQU57 to RalB-GDP was also determined, 
by using isothermal titration calorimetry (ITC), which yielded a disso- 
ciation constant (Kg) of 7.7 + 0.6 uM (Fig. 2f). This finding was similar 
to the results from surface plasmon resonance (SPR), which gave a Kg 
of 4.7 + 1.5 uM (Extended Data Fig. 4d). 

Next we evaluated the action of RBC8, BQU57 and RBCS (the last as 
a negative control) on the human lung cancer cell lines H2122, H358, 
H460 and Calu-6. Ral promotes anchorage independence’ therefore, 
we measured cell growth in soft agar. We examined drug uptake and found 
that RBC8, BQU57 and RBC5 were readily taken into cells (Extended 
Data Fig. 5a—c). In addition, we found that all four cell lines were sensi- 
tive to siRNA-mediated depletion of K-RAS (Extended Data Fig. 6a, b) 
but that only H2122 and H358 cells were sensitive to RAL knockdown 
(Extended Data Fig. 6c, d). We used this characteristic to assess the spe- 
cificity of the compounds for inhibiting Ral. Colony formation in soft 
agar showed that the Ral-dependent lines H2122 and H358, but not H460 
or Calu-6, were sensitive to treatment with RBC8 or BQU57 (Fig. 3a, b). 
The half-maximum inhibitory concentration (IC59) of RBC8 was 3.5 1M 
in H2122 cells and 3.4 uM in H358 cells; for BQU57, the ICs was 2.0 uM 
in H2122 cells and 1.3 LM in H358 cells. The inactive control compound 
RBCS did not inhibit the growth of any of these cell lines (Extended Data 
Fig. 5d). Additionally, a Ral pull-down assay using RALBP1-bound agar- 
ose beads* showed that RBC8 and BQU57, but not RBCS, inhibited 
both RalA and RalB activation in both the H2122 and H358 cell lines 
(Extended Data Fig. 5e). 

To further examine the specificity of these compounds for Ral, RALA 
and RALB were knocked down in H2122 and H358 cells with specific 
siRNAs. RBC8 or BQU57 treatment showed no further inhibition of 
colony formation after RAL knockdown (Fig. 3c-fand Extended Data 
Fig. 6e). This supports the conclusion that the inhibition of cell growth 
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Figure 2 | Characterization of compounds 


binding to Ral. a, Chemical structure of BQU57. 
b, Overlay of the 1SN-TROSY spectrum of 100 1M 
RalB-GDP in the absence (black) and presence 

(magenta) of 100 1M BQU57. c, Selected residues 
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represent three independent experiments. AH, 
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Figure 3 | Growth inhibitory activity of Ral inhibitors on human cancer cell 
lines. a, b, Effects of RBC8 (a) and BQU57 (b) treatment on the anchorage- 
independent growth of four human lung cancer cell lines. The cells were seeded 
in soft agar containing various concentrations of each compound, and colonies 
were counted after 2-4 weeks. Cell lines that are sensitive to RAL-directed 
knockdown (H2122 and H358) are shown in red, and cell lines that are resistant 
to RAL-directed knockdown (H460 and Calu-6) are shown in black. c-f, Effect 
of siRNA-mediated knockdown of both RALA and RALB (RalA/B) on drug- 
induced growth inhibition in soft agar of H2122 cells (c, d) and H358 cells 
(e, f). Cells were transfected with 10, 30 or 50 nM siRNA for 48 h, collected and 
subjected to the soft agar colony formation assay. The effect of siRNA alone on 


the soft agar colony number is shown in ¢ (H2122) and e (H358); the effect of 
siRNA plus drug treatment on colony formation is shown as the percentage of 
the DMSO-treated control in d (H2122) and f (H358). The control is shown 
in black; 10nM drug, in red; 30 nM drug, in green; and 50 nM drug, in blue. 
g-j, Effect of the overexpression of constitutively active RalA©”*Y and RalB@”*Y 
on drug-induced growth inhibition in soft agar of H2122 cells (g, h) and 
H358 cells (i, j). H2122 cells or H358 cells were transiently transfected with Flag 
alone (black), Flag-RalA@?3V (red) or Flag-RalB@?Y (blue) for 48 h before 
the soft agar colony formation assay. The results in all panels are presented as 
the mean + s.d. of triplicate experiments. *, P< 0.05, Student’s t-test or 
Dunnett’s test. 
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by these compounds depends on Ral proteins. Moreover, overexpres- 
sion of constitutively active (GTP-bound form’) RalA°3¥ or RalBO?Y 
mutant proteins (Extended Data Fig. 6f), which do not bind to these 
compounds (Extended Data Figs 3d and 4b), mitigated the inhibition of 
H2122 and H358 cell growth by these compounds (Fig. 3g-j and Extended 
Data Fig. 6f). Together, these data provide evidence that RBC8 and BQU57 
act specifically through the GDP-bound form of Ral proteins. 

The inhibition of Ral activity and tumour growth by these compounds 
were evaluated in human lung cancer xenografts in mice. The pharma- 
cokinetics of RBC8 and BQU57 were analysed in mice. Serum concen- 
trations were determined using liquid chromatography coupled to tandem 
mass spectrometry (LC-MS/MS) after intraperitoneal injection of the 
compound. RBC8 and BQU57 showed properties that define good drug 
candidates (Extended Data Fig. 7a). We then determined compound 
entry to tumour tissue 3 h after dosing, and the compounds were detected 
in tumour tissue in vivo (Extended Data Fig. 7b, c). To test the effect of 
Ral inhibitors on tumour xenograft growth, nude mice were inoculated 
subcutaneously with H2122 (human) cells and treated intraperitoneally 
with 50 mg per kg body weight of RBCS per day for 21 days (except on 
weekends). RBC8 inhibited tumour growth (Fig. 4a and Extended Data 
Fig. 7d) toa similar extent to dual knockdown of RALA and RALB (Fig. 4b). 
Another lung cancer line, H358, yielded similar results (Extended Data 
Fig. 7e). BQU57 was tested in vivo at several different doses (10, 20 and 
50 mg per kg body weight per day), and dose-dependent growth inhibi- 
tion effects were observed (Fig. 4c). 

To further evaluate the specificity of the compounds for the Ral-family 
GTPases, H2122 tumour xenografts (median size, 250 mm*) were col- 
lected 3 h after a single intraperitoneal injection of RBC5 (50 mg per kg 
body weight), RBC8 (50 mg per kg body weight) or BQU57 (10, 20 and 
50 mg per kg body weight), and the activation of Ral in tumour extracts 


was analysed in RALBP1 pull-down assays. Both RalA and RalB were 
inhibited by RBC8 (Extended Data Fig. 8a-d) and by BQU57 (Fig. 4d) 
but not by the inactive compound RBC5 (Extended Data Fig. 8e, f). By 
contrast, no inhibition of Ras or RhoA activity was observed (Fig. 4d). 

One reason for the failures to obtain clinically useful inhibitors of Ras 
and other related GTPases is the highly conserved guanine nucleotide 
binding site in these GTPases. This site has a high affinity for the gua- 
nine nucleotides GDP and GTP, which are present at millimolar con- 
centrations in cells and would out-compete ligands for this site. Similar 
considerations have delayed the development of protein kinase inhi- 
bitors. Indeed some of the best kinase inhibitors have proved not to be 
competitive with ATP but to be allosteric inhibitors that lock the con- 
formation of protein kinases, such as MEK, ina closed state”’. Recently, 
three studies used a similar fragment-based small molecule screening 
approach to identify compounds that bind to sites on the K-Ras surface 
and block its SOS-mediated activation****, suggesting that this approach 
has promise. 

Although our initial library screening was based on the RalA struc- 
ture, the selected compounds also bound to RalB, which is not surpris- 
ing given the similarity of the amino acid sequences and the predicted 
structures. Molecular docking could not be performed on RalB-GDP 
since only the RalB-GNP structure is available. However, NMR experi- 
ments with RalB-GDP demonstrated interactions within the allosteric 
site. Moreover, the selected compounds inhibited the activity of both 
RalA and RalB in cell culture and in human tumour xenografts. Although 
RalA and RalB have been proposed to have distinct roles in tumorigen- 
esis and metastasis'*'*"*, genetically engineered mouse models have 
revealed substantial redundancy for Ral proteins in tumorigenesis’”. 
These results support the clinical utility of compounds that inhibit both 
of these GTPases. Although additional medicinal chemistry optimization 


a a c 
& 150| #DMso %& 2007 a Control siRNA % 5007 4pmso 
E  RBC8: 50 mg kg"! € + RalA/B siRNA E 499] * BQUS7: 10 mg kg" 
= (150 e  BQUS7: 20 mg kg 
2 100 e F300] & BQUS?: 50 mg kg 
2 is 2 100 is 2 : 
ae g S 200 
fa re e 
3 3 50 oe 3 100 
e E i 2 
5 0 ; 35 0 = ; 3 0 
F "0 2 4 6 8 101214 16 18 20 22 F "0 2 4 6 & 101214 16 18 20 22 e 0 2 4 6 8 10121416 182022 
Days after inoculation Days after inoculation Days after inoculation 
d RalA RalB Ras RhoA 
12 3 4 5 6 /rRalA 12 3 4 5 6 /rRalA 12 3 456 HisRas 1 2 3 4 5 6 His-RhoA 
= 
DMSO aaweea a —— a - —_—_——— — ee) 
10 kg-t ——_——— -~ —_—- --—-=— — ? —? 
mg kg _ —_——_— — — ee ew ee 
_ 
20 mg kg"! | ee a Se = oewe—— = ce ee 
— —_ 
50 mg kg! | = ee = S -_—-- ———--— = 
ro} Ss rl . oO 
$120] . £120 £120) ah E1204 ce 
8 1007 —- © 1004 —s— S 1004 + S48 SO yzpofase 8 . 
ts) 3 re) oD A 3 oe ee ° .: a © eee 
x 80 , se 80 = x2 80 . 2 80 Sahar amme 
< 60 a mo 60 ow Q 60 < 60 
S49 2 SB 4 ww © 49 © 49 
ii = « e x 
& 20 g 20 2. = 20 @ 20 
8 0 : 3 0 2 Bo 
< 0 10 20 5o < 0 10 20 50 0 10 2 50 < 0 10 20 50 
BQU57 (mg kg") BQU57 (mg kg") BQU57 (mg kg~') BQU57 (mg kg~) 


Figure 4 | Effect of Ral inhibitors in vivo. a, RBC8 (50 mg per kg body weight 
per day) was administered to mice 24h after inoculation with the human lung 
cancer cell line H2122, and it inhibited growth of the tumour xenograft. 

b, siRNA depletion of both RalA and RalB inhibited the growth of H2122 
tumour xenografts. The cells were transiently transfected with siRNA for 
24h before inoculation of nude mice. c, BQU57 treatment (10, 20 or 50 mg 
per kg body weight per day) initiated 24 h after inoculation inhibited the growth 
of H2122 tumour xenografts. The data in a—c are presented as the 

mean ~ s.e.m. for groups of six mice. *, P< 0.05, Student’s t-test. d, BQU57 
treatment inhibited the activity of RalA and RalB but not Ras and RhoA in 
H2122 tumour xenografts. Tumour-bearing nude mice were given a single dose 
of 10, 20 or 50 mg per kg body weight BQU57. Tumours were collected 3 h later, 
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and the activity of RalA, RalB, Ras and RhoA in tumour lysates was then 
measured using the respective pull-down assay for each GTPase. Immunoblots 
from the activity pull-down assays (top) and the corresponding quantifications 
(bottom) are shown. Each lane represents one tumour sample, and each blot 
represents one treatment group. The last lane in each blot was loaded with 

10 ng recombinant human protein as an internal control for normalization and 
cross-blot comparison. The band intensity on each blot was first normalized to 
the internal control and then compared across different blots. The amount 
of active Ral, Ras or RhoA (bottom) is shown as the percentage of that in 

the DMSO-treated control. Each dot represents one tumour sample, and 
horizontal bars represent the mean of six samples. Colours match those in c. 
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is required, these Ral inhibitors represent a first generation of valuable 
tools for elucidating Ral signalling and for developing novel agents for 
cancer therapy. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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Structures of bacterial homologues of SWEET 
transporters in two distinct conformations 


Yan Xu'*, Yuyong Tao’, Lily S. Cheung**, Chao Fan', Li-Qing Chen”, Sophia Xu’, Kay Perry*, Wolf B. Frommer*” & Liang Feng! 


SWEETs and their prokaryotic homologues are monosaccharide and 
disaccharide transporters that are present from Archaea to plants and 
humans’ ’. SWEETs play crucial roles in cellular sugar efflux processes: 
that is, in phloem loading’, pollen nutrition’ and nectar secretion®. 
Their bacterial homologues, which are called SemiSWEETs, are among 
the smallest known transporters’. Here we show that SemiSWEET 
molecules, which consist of a triple-helix bundle, form symmetrical, 
parallel dimers, thereby generating the translocation pathway. Two 
SemiSWEET isoforms were crystallized, one in an apparently open 
state and one in an occluded state, indicating that SemiSWEETs and 
SWEETs are transporters that undergo rocking-type movements dur- 
ing the transport cycle. The topology of the triple-helix bundle is 
similar yet distinct to that of the basic building block of animal and 
plant major facilitator superfamily (MFS) transporters (for exam- 
ple, GLUTs and SUTs). This finding indicates two possibilities: that 
SWEETs and MEFS transporters evolved from an ancestral triple-helix 
bundle or that the triple-helix bundle represents convergent evolution. 
In SemiSWEETs and SWEETs, two triple-helix bundles are arranged 
ina parallel configuration to produce the 6- and 6 + 1-transmembrane- 
helix pores, respectively. In the 12-transmembrane-helix MFS trans- 
porters, four triple-helix bundles are arranged into an alternating 
antiparallel configuration, resulting in a much larger 2 x 2 triple- 
helix bundle forming the pore. Given the similarity of SeomiSWEETs 
and SWEETs to PQ-loop amino acid transporters and to mitochon- 
drial pyruvate carriers (MPCs), the structures characterized here may 
also be relevant to other transporters in the MtN3 clan’ °. The insight 
gained from the structures of these transporters and from the ana- 
lysis of mutations of conserved residues will improve the understand- 
ing of the transport mechanism, as well as allow comparative studies 
of the different superfamilies involved in sugar transport and the 
evolution of transporters in general. 

Sugars produced by photosynthesis are key energy sources for humans. 
In both plants and animals, sugars are transported across cellular mem- 
branes as a means of distribution throughout the body'®"’. While sugar 
transporters are essential for translocation in plants, human sugar trans- 
porters play critical roles in glucose homeostasis, and mutations in these 
transporters can lead to conditions such as diabetes, glucose malabsorp- 
tion and epilepsy’®”’. Striking similarities exist among the sugar trans- 
porter proteins used by plants and animals. Animal and human genomes 
encode three major classes of sugar transporter: the MFS-type transporters 
of the GLUT family (SLC2 family)", the sodium-dependent glucose trans- 
porters of the SGLT family (SLC5 family)’ and the recently identified 
SWEET and SemiSWEET sugar transporters (SLC50 family)'’. Plant 
genomes contain genes encoding GLUT transporter homologues (in 
particular the STP glucose/H* symporters and the SUT sucrose/H* 
symporters’*) and the SWEET transporters’. Major breakthroughs in 
understanding the transporter function resulted from solving atomic struc- 
tures of the prototype of the MFS transporter family, lactose permease”, as 
well as of GLUT" and an SGLT”* homologue. MFS and SGLT transporters 


have fundamentally different structures: MFS transporters are composed. 
of four structurally related triple-helix bundles (THBs) arranged in an 
antiparallel format, whereas the structure core of SGLT consists of two 
five-transmembrane-helix bundles in an antiparallel arrangement. 

Until now, there has been limited information on the structure of 
SWEETs and their bacterial homologues, the SemiSWEETs”’. Plant 
SWEETs play crucial roles in intercellular transport and cellular secre- 
tion. Specific isoforms are key for cellular efflux as a first step in phloem 
loading‘, pollen nutrition’ and nectar secretion®, and they also play key 
roles in pathogen susceptibility**"°. The human genome contains a single 
SWEET homologue, which functions as a glucose transporter’. The find- 
ing that the Ciona intestinalis (vase tunicate) SWEET is essential indicates 
that animal and human SWEETs play important roles in physiology’’. 
SWEETs are unique in that eukaryotic isoforms are predicted to be hepta- 
helical with an internal THB repeat’, while prokaryotic SemiSWEET 
polypeptides contain only three transmembrane helices’. 

To determine the structure and function of SSemiSWEETs and SWEETs, 
two SemiSWEETs were crystallized in different states. The basic unit in 
both structures isa THB arranged as a 1-3-2 bundle, and two THBs are 
arranged in parallel to form the conduit. Six transmembrane helices are 
thus sufficient to form the pore. Moreover, the detection of two distinct 
states indicates that SemiSWEETs and SWEETs do not function as sugar 
channels but rather as transporters that undergo rocking movements. 
We suggest that the eukaryotic heptahelical SWEETs form a similar 
structure in which a SemiSWEET-like dimer made from the internally 
repeated THB is fused via an inversion linker helix (transmembrane 
helix 4 (TM4)). We also show that pairs of tryptophan and asparagine 
residues in the pore are essential for SemiSWEET and SWEET function. 

The structure of a SemiSWEET from Vibrio sp. N418 (Extended Data 
Fig. 1) was determined at 1.7 A resolution from crystals grown in the 
lipid cubic phase (LCP) (Extended Data Table 1). The protomer of the 
Vibrio sp. SemiSWEET contains three transmembrane helices and a 
non-conserved extra amino-terminal amphipathic o-helix. Within the 
protomer, TM3 is sandwiched between TM1 and TM2, and there is 
little direct contact between TM1 and TM2 (Fig. 1). This arrangement 
of transmembrane helices has similarities to the triple-helix repeat of MFS 
transporters’’, although SemiSWEETs and MEFSs do not show sequence 
homology (Supplementary Discussion and Extended Data Figs 6 and 7b). 
The orientation of the protomer relative to the membrane was inferred 
using the ‘inside-positive rule’, which is in good agreement with the 
observation that the carboxy terminus of Arabidopsis thaliana SWEET11 
is phosphorylated in vivo”. 

In the Vibrio sp. SemiSWEET crystal, two molecules that are related 
by a two-fold axis perpendicular to the membrane tightly interact to 
form a dimer (Fig. 1b and Extended Data Fig. 2). At the dimer interface, 
TM1 ofone protomer is packed against TM2 of the other protomer, and 
TM2 of the first-mentioned protomer is packed against TM1 of the other 
protomer. When viewed from the extracellular side, the backbone of the 
dimer forms a basket-like structure, with an opening to the extracellular 
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Figure 1 | Structure of Vibrio sp. SemiSWEET. a, Ribbon representation ofa 
Vibrio sp. SemiSWEET protomer viewed from the side of the membrane. 
Helices in the THB are shown in blue, yellow and red. b, Ribbon representation 
of the Vibrio sp. SemiSWEET dimer. One protomer is shown in purple, and the 
other is in green. c, A slab view of the Vibrio sp. SemiSWEET dimer 

showing the central cavity, coloured according to electrostatic potential. Red 
denotes negative potential; blue denotes positive potential; white denotes 


side, while the intracellular side is sealed by loops L1-2 (Fig. 1b). Several 
lines of evidence support the physiological relevance of dimer forma- 
tion. First, the interface between the subunits is extensive, encompassing 
~1,970 A?. Second, the dimer is formed in the lipid bilayer environment 
of the crystal (Extended Data Fig. 2). The majority of non-packing hydro- 
philic residues within the membrane point to the centre of the dimer 
interface, compatible with a putative translocation route at the interface of 
the subunits. Third, consistent with the structure, Vibrio sp. SemiSWEET 
dimerizes in solution and remains a dimer during SDS-PAGE, indicat- 
ing stable dimer assembly (Extended Data Fig. 3a). Crosslinked Brady- 
rhizobium japonicum SemiSWEET? products also migrate as dimers 
(Extended Data Fig. 3b). Together, our structural and biochemical obser- 
vations strongly suggest formation of the transport pore of Vibrio sp. 
SemiSWEET by a dimer. 

From bioinformatic analyses (for example, using the database Pfam), 
SemiSWEETs belong to the MtN3 clan and are distantly related to the 
PQ-loop family in this clan, a family that is defined by a conserved 
PQ-dipetide motif’”'’. In contrast to the assumption that the PQ motif 
is positioned in a loop region, this motif is embedded in the membrane 
and is part of TM1 in Vibrio sp. SemiSWEET (Fig. 1d). The role of the 
conserved glutamine in SemiSWEETs is revealed by analysing the dimer 
interface: the CO and NH moieties of the side chain amide group of Q29 
form hydrogen bonds with the NH and CO from two consecutive back- 
bone amides immediately N-terminal to TM2 of the other protomer, not 
only bringing the L1-2 loop to the dimer interface but stabilizing the 
L1-2 loop conformation (Fig. 1d). The proline preceding the glutamine 
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neutral potential. d, Close-up view of the PQ motif near the Vibrio sp. dimer 
interface. The inter-protomer hydrogen bonds between Q29 on TM1 and 
backbone amides on L1-2 are shown as dashed lines. e, Solvent accessible 
surface area in the cavity. The solvent accessible surface is shown as a cyan 
mesh. The protein is shown as a grey ribbon with the invariant residues W59 
and N75 as sticks. For the sticks, carbon is green, nitrogen is blue and oxygen is 
red. AH, o-helix; L, loop; TM, transmembrane helix. 


induces a kink in the helix, probably increasing the flexibility of the trans- 
membrane helix, thereby allowing formation of the glutamine-backbone 
interaction or potentially facilitating the disruption of the interaction 
during the transport cycle. 

The crystal structure indicates that the three-transmembrane-helix 
protomer cannot form an enclosed compartment for substrate transport. 
Instead, there is a solvent-filled cavity between the two protomers at the 
central two-fold axis (Fig. le). The cavity transverses approximately half 
way across the membrane and is completely separated from the lipid 
bilayer by the surrounding six transmembrane helices but remains acces- 
sible from the extracellular side (Fig. 1c). This open cavity measures 9.2 A 
at the narrowest point and is sufficient to allow small molecules to freely 
diffuse in or out. Of the amino acids lining the cavity, W59 and N75 
(Fig. le) are the most conserved across species, constituting the only two 
invariant residues in 66 analysed SemiSWEET sequences. Both residues 
strategically sit at a similar level above the bottom of the open cavity. 
Their side chains surround the centre of the cavity, forming a putative 
binding pocket, and are most probably within the range to interact with 
substrates given the size and geometry of the cavity. It is noteworthy that 
both residues are also highly conserved in the three-transmembrane-helix 
repeats of SWEETs** and MPC2 (refs 8, 9), supporting their functional 
importance. 

To further investigate the transport mechanism, we focused on a 
SemiSWEET from Leptospira biflexa serovar Patoc that has significant 
homology (44% identity and 63% similarity) to the known sugar trans- 
porter B. japonicum SemiSWEET? (Extended Data Fig. 1). The structure 
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of L. biflexa SemiSWEET was determined at 2.4 A resolution from crystals 
grown in the LCP (Extended Data Table 1). In one asymmetrical unit, 
two molecules tightly interact with each other, forming a dimer in the 
lipid bilayer (Fig. 2 and Extended Data Fig. 4). The dimer interface of 
L. biflexa SemiSWEET is highly similar to that of Vibrio sp. SemiSWEET, 
despite only modest sequence similarity (15% identity), strongly sup- 
porting the notion that the dimeric architecture is a common feature 
of SemiSWEETs. 

At the interface of two protomers, L. biflexa SemiSWEET contains a 
large cavity immediately above its centre (Fig. 2b). In contrast to Vibrio 
sp. SemiSWEET, the cavity of L. biflexa SemiSWEET is completely sealed 
from solvent. Near the extracellular side, D57 from one protomer forms 
hydrogen bonds with Y51 from the other protomer (Fig. 2c), shielding 
the cavity from the extracellular solution. This structure may explain the 
high conservation of D57 and Y51 across the SemiSWEET, SWEET and 
MPC families***”: these residues form the cap on top of the cavity in the 
‘occluded’ state, and cross-protomer interactions may facilitate the for- 
mation of this conformation. At the centre of the cavity, there is a strong 
non-protein electron density, the identity of which cannot be unambig- 
uously determined at this resolution (Fig. 2d). The flat-shaped density is 
surrounded by W48 and N64 (equivalent to W59 and N75 in Vibrio sp. 
SemiSWEET) from both protomers. The antiparallel aromatic ring of 
tryptophan (W) from each protomer is within 4 A of the putative substrate 
and may interact with and stabilize the putative substrate in the pocket. 
The precise mode of interaction is unclear but possibly involves hydrogen 
bonds and stacking interactions. The asparagine (N) side chains point to 


the putative substrate and are in close proximity, probably contributing to 
substrate binding. These structural observations implicate W48 on TM2 
and N64 on TM3 as critical in substrate binding and translocation. 

To assess the roles of W48 and N64 in sugar transport, we genera- 
ted alanine substitution mutants of L. biflexa SemiSWEET and tested 
their transport activity. In cell-based radiotracer uptake assays, a glucose- 
uptake-deficient Escherichia coli strain’* expressing wild-type L. biflexa 
SemiSWEET showed a significantly higher glucose uptake than controls, 
consistent with the homology-based prediction that L. biflexa SemiSWEET 
transports sugar. When W48 or N64 was mutated to alanine (W48A or 
N64A), glucose uptake was markedly reduced to a level similar to that 
in controls (Fig. 3a). We did not detect significant glucose uptake activity 
by Vibrio sp. SemiSWEET (data not shown). Furthermore, alanine sub- 
stitutions of the corresponding tryptophan and asparagine in both THB 
repeats in A. thaliana SWEET1 (Extended Data Fig. 1), a glucose trans- 
porter, failed to complement the growth phenotype of a hexose-uptake- 
deficient yeast strain (Fig. 3b). The mutations had no significant effect 
on the plasma membrane localization of SWEET1 in yeast (Extended 
Data Fig. 5). These results demonstrate that tryptophan and asparagine 
play critical roles in sugar transport in both bacterial SemiSWEETs and 
plant SWEETs and further support the notion that SemiSWEETs and 
SWEETs have the same basic architecture. 

Alternating access is the prevailing model for explaining substrate 
translocation by transporters”*”*. Our structural observations support 
an alternating access mechanism in SemiSWEETs. In the crystal, Vibrio 
sp. SemiSWEET adopted an outward open conformation, while L. biflexa 


Figure 2 | Structure of L. biflexa SemiSWEET. a, Two views of the L. biflexa 
SemiSWEET dimer in ribbon representation. One protomer is shown in red, 
and the other is shown in blue. b, A slab view of the L. bifleca SemiSWEET 
dimer, coloured according to electrostatic potential (as in Fig. 1c) and showing 
the central cavity. c, Residues capping the cavity. L. biflexa SemiSWEET in 
ribbon representation is shown as viewed from the extracellular side. The 
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hydrogen bonds between Y51 and D57 are shown as dashed lines. d, Two views 
of the electron density map of a putative substrate in the cavity. The F, — F. 
map contoured at 3.00 is displayed as a red mesh. W48 and N64 are shown as 
sticks. TM1 was removed for clarity (left). Stick representation colours are as in 
Fig. 1. 
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Figure 3 | Glucose transport by L. biflexa SemiSWEET and A. thaliana 
SWEETI. a, Glucose uptake activity of L. biflexa SemiSWEET in E. coli. 
W48A or N64A mutations abolished glucose uptake (control denotes empty 
vector). Transport activities were normalized to that of the wild type (WT) 
(mean = s.e.m., n = 3). The uptake by the WT was significantly different 
from that by the control or the mutants (two-tailed t-test, P< 0.01). 

b, Functional analysis of A. thaliana SWEET] transport activity in the 


SemiSWEET was captured in an occluded state (Figs 1c and 2b). Although 
L. biflexa and Vibrio sp. SemiSWEET have modest sequence identity, 
the monomer of L. biflexa SemiSWEET superimposes well onto that of 
Vibrio sp. SemiSWEET, with a main-chain root mean squared deviation 
(r.m.s.d.) of 1.1 A over 66 aligned residues (Fig. 4a). By contrast, pro- 
nounced differences were observed between dimers of L. biflexa and Vibrio 
sp. SemiSWEET (Figs 1b, 2a and 4b). The dimer interface of Vibrio sp. 
SemiSWEET opens more towards the extracellular side than L. biflexa 
SemiSWEET and is ~ 10 A wider at the extracellular surface (Fig. 4b). This 
widening is achieved mainly through a ~10° rotation of the protomer 
around the part near the intracellular membrane surface (Fig. 4a). Toa 
lesser extent, a slight bending of the transmembrane helices of L. biflexa 
SemiSWEET towards the centre contributes to its more closed con- 
formation. The conformational differences between Vibrio sp. and L. 
biflexa SemiSWEET indicate a ‘rocker switch’ mechanism and bear 
some parallels to structurally unrelated transporter families, such as MFS 
transporters and ATP-binding cassette (ABC) transporters, in which 
rigid body rocking between the transmembrane subdomains provides 
alternating access to the substrate'*’”*°’”, We propose that a similar 
rocking-type movement of two SemiSWEET subunits will result in two 
additional states, an ‘inward open’ conformation and an ‘occluded, empty’ 
state, to complete a transport cycle (Extended Data Fig. 7a). It remains 
to be determined whether the transport cycle is coupled to proton trans- 
fer or operates by facilitated diffusion. SWEETs show properties that 
are consistent with facilitated diffusion or a uniport mechanism, 
including pH independence and low affinity”*. More detailed func- 
tional analysis informed by the structures may help to determine the 
exact transport mechanism of SWEET and SemiSWEETs. 

Eukaryotic SWEETs consist of two SemiSWEET-like units fused via an 
inversion linker transmembrane helix. Previously, it was unclear whether 
SWEETs were large enough to form a pore from a single heptahelical 
subunit or whether they would have to form a higher oligomer’. In light 
of the SemiSWEET structures with putative substrate binding sites at the 
centre of the dimer interface, we propose that the two THBs ina single 
SWEET can form the transport route. Mutagenesis analysis of SWEET 
is compatible with this hypothesis as it shows the functional conservation 
of key residues between SemiSWEET and SWEET proteins. SWEET might 
form multibarrelled oligomers as part of a regulatory mechanism. Such 
regulation has been observed in GLUT family glucose transporters*, AMT1 
family ammonium transporters” and NRT1 family nitrate transporters”®. 
Finally, MPCs*? contain a related THB, and our data are consistent with 
the observation that two copies of MPC are required to produce a func- 
tional transporter, probably by forming a heterodimer. 


hexose-transport-defective yeast strain EBY4000. W56A and N73A (first THB) 
and W176A and N192A (second THB) mutants failed to complement the 
growth defect of EBY4000 in synthetic medium supplemented with 2% glucose 
as the sole carbon source. Growth was unaffected in a control medium 
containing 2% maltose. Empty vector and A. thaliana SWEET! were used as 
the negative and positive controls, respectively. 
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Figure 4 | Alternating access by the rocking-type movement of two 
protomers. a, Structural comparison of L. biflexa SemiSWEET and Vibrio sp. 
SemiSWEET dimers. Left, protomer A of the two structures was superimposed 
and structurally aligned well. Right, protomer B of the two structures 

showed ~10° rotation between the protomers when protomer A was 
superimposed. b, A ribbon representation of Vibrio sp. SemiSWEET shows that 
it opens up more to the extracellular side than does L. bifleca SemiSWEET. 
TMI was removed for clarity, and selected residues are shown as sticks (colours 
are as in Fig. 1). The cross-protomer distance between L2-3 is shown as a 
dashed line. 
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CAREERS 


TURNING POINT Neurologist explores 
passion for public engagement p.455 


STEREOTYPING PhD Halloween costume 
snags on cleavage p.455 


NATUREJOBS For the latest career 
listings and advice www.naturejohs.com 


Focus on people 


Nature announces this year’s outstanding science mentors 
in Ireland or Northern Ireland. 


BY PHILIP CAMPBELL 


elentless commitment to the careers 
R« students and postdoctoral research- 
ers has distinguished the recipients of 
Nature’s annual mentoring awards since the 
scheme’s inception in 2005. Winners devote 
much attention to their junior lab members, 
even as they maintain distinction in their 
discipline. 
This year is no exception. The judges of the 
2014 Nature Mentoring Awards confessed to 


awe over the level of commitment to mentoring 
that nominees exhibited. Many qualities make a 
good mentor (see Nature 447, 791-797; 2007), 
and 2014's entrants display these in abundance. 

Each year, the competition takes place in a 
different country or region; this year it hon- 
oured nominees in Ireland and Northern Ire- 
land (see go.nature.com/bacwn3 for details). 
Nature gives out two €10,000 (US$12,425) 
mentoring awards each year, one for mid-career 
achievement, the other for lifetime achieve- 
ment. Each entry includes written statements 


from five people who had been mentored by 
the nominee at different stages of the nominee's 
career, as well as a statement from the nominee 
about his or her mentoring. Although the lat- 
ter might seem to force immodesty from nomi- 
nees, it actually helps to reveal their humility by 
illustrating their philosophy of service to their 
protégés. Above all, it is a collection of facts 
about the history of their mentoring and an 
opportunity to assess their thinking about and 
experiences in the roles of a mentor. 

The six-judge panel, chaired by Luke O’Neill 
of Trinity College Dublin, was drawn from 
disciplines across the natural sciences (see 
go.nature.com/nz8lya for the list). The panel 
also includes an observer-participant from 
Nature, who this year was myself. 

This year’s winners are Cormac Taylor, a 
cellular physiologist at University College 
Dublin; Cliona O’Farrelly, a comparative 
immunologist at Trinity College Dublin; and 
Martin Clynes, director of the National Insti- 
tute for Cellular Biotechnology at Dublin City 
University. They received their awards on 
3 November at the Science Foundation Ire- 
land Science Summit at the Hodson Bay Hotel 
in Athlone, Ireland. 


MID-CAREER ACHIEVEMENT 
Taylor won this year’s mid-career award. Ata 
time when the rigour and reproducibility of 
some science is in question, and lab leaders are 
under great pressure to deliver, it was gratify- 
ing to see in Taylor’s statement a strong com- 
mitment to robustness. He called appropriate 
statistical analysis, as well as sound experi- 
mental design, ethics and data acquisition ‘key 
cornerstone foundations for scientific success, 
and said that he aims to instil the importance of 
these qualities in his trainees early on. “I try also 
to balance positive reinforcement and encour- 
agement with a healthy dose of constructive 
criticism and scientific scepticism when dis- 
cussing data with my lab members,’ he wrote. 
Of course, many mentors encourage rigour. 
Several nominators mentioned other qualities. 
One described how Taylor had helped to ease 
the common and frustrating career bottleneck 
from senior postdoc to independent scientist. 
The nominator had developed a niche 
research area that was aligned with, but dis- 
tinct from, the main research focus of Taylor's 
lab. “Cormac was unbelievably supportive of 
my pursuit of this research area and gave me 
the time, space, resources and mentorship to 
pursue this area in parallel with my primary 
projects at the time,” the trainee wrote. > 
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CAREERS 


Cormac Taylor (left) won the mid-career award for mentoring; and Cliona O’ Farrelly and Martin Clynes share the lifetime award. 


> More than one person mentioned Taylor’s 
endorsement of openness in the lab and around 
its research. “He always made the point that it 
is more important to present your unpublished 
data at conferences in order to be recognized 
scientifically rather than keeping the results 
secret in the fear of being scooped,’ wrote one 
trainee. Taylor’s philosophy helped that person 
to meet and become acquainted with many 
more researchers in the field than would have 
been the case otherwise, the trainee wrote. 


LIFETIME AWARDS 
O’Farrelly and Clynes share the lifetime- 
achievement award. Nominators wrote that 
beyond helping with their research, O’Farrelly 
demonstrated that researchers need not live in 
ivory towers. “Cliona is living proof that you 
can engage with people and things outside of 
science and still be a great scientist,” wrote one. 
“Too many scientists today are reclusive or dis- 
engaged with the wider world around them” 
That engagement included understanding 
not only the potential of each lab member, 
but also his or her personal situation. One 
nominator described how O Farrelly helped 
her to balance parenthood with science. After 
the student returned from maternity leave, 
O’Farrelly insisted that she work fewer hours 
each week. “Cliona said people who are happy 
will get more work done, and she was right — 
it was actually the most productive year of my 
PhD? The nominator added that O’Farrelly 
showed in many other instances that she is a 
consistent advocate for women in science. 
O’Farrelly’s humanity is enhanced by humil- 
ity and generosity with time and ideas, wrote 
another nominator. “Her openness and will- 
ingness to admit how much she doesn't know 
(and how much is not yet known) instils an 
unquenchable curiosity in her mentees. It 
showed me that scientists are human too, even 
the high-performing ones.” 


That humanity extends to helping people 
through the hard times that afflict any junior 
researcher. Whenever graduate students hit the 
proverbial wall, O’Farrelly would fish them from 
the ‘Slough of Despond’ and have them review 
their first lab books with her. “Even for one at 
the lowest ebb of self-esteem, it is a revelation to 
see just how plug-ignorant and clueless you were 
when you started; the nominator wrote. “You 
cannot help but feel better when it is clear that 
you have learned so much, and that your toolbox 
is so much better filled with sharper tools now” 
The nominator added that this philosophy didn't 
make O’Farrelly a soft touch. Ifsomeone needed 
an ultimatum and “a quite-brutal shove ... she 
didn’t shy away from it”. 

In 35 years, Clynes has amassed a portfolio of 
some 150 students and postdocs, now scattered 
across many nations 


and in many roles “Although he is 
inside and outside never cruelor 
academia, including overly blunt, he 
major established doesn’t sugar- 
companies andnew coat things that 
start-ups. Inhisstate- canbe difficult 
ment, he highlighted to hear at the 


the virtues ofa col- time.” 
lective approach to 
mentoring. Making sure that younger scien- 
tists have multiple mentors protects against the 
“dominance” of a single opinion, Clynes wrote. 
He carries the idea of multiple perspectives 
into lab meetings, saying that he encourages 
“discussion of problems with science” and that 
final decisions should not always be made by 
the scientist with the highest status in the lab. 
Clynes also supports constructive criticism 
and says that public humiliation and personal 
attacks should never take place. Labs should 
emphasize moving ahead from failure and 
avoid assigning blame, he said, and lab heads 
should praise success and encourage effort. 
Testimonials from his nominators show 
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how Clynes’s philosophy has helped to foster a 
comfortable culture in the lab. One nominator 
pointed to Clynes’s skill in balancing honesty 
with tact. “He can always be trusted to give a 
student or a colleague a frank and truthful opin- 
ion,’ the person wrote. “Although he is never 
cruel or overly blunt, he doesn’t sugar-coat 
things that can be difficult to hear at the time.” 

Such honest feedback generates confi- 
dence, wrote another nominator. “Martin 
repeatedly put me in situations that allowed 
me to develop, grow, to take responsibility and 
accountability because he was able to see quali- 
ties that I did not yet see in myself?” However, 
the mentee said, Clynes ensures that people 
earn that sense of confidence honestly. Clynes 
taught junior researchers to ask hard questions 
about their own and others’ work — “not to be 
acontrarian’, the mentee wrote, “but to con- 
tinually improve and remain open to other 
possibilities and options.” 

Clynes was uncanny about selecting criti- 
cal moments to challenge his students, wrote 
another. When the graduate student was fac- 
ing burnout a year before finishing a PhD, 
Clynes told the person to stop lab work, sum- 
marize the research outcomes thus far, plan 
the next stage and prioritize the remaining 
work. The nominator called that experience 
a “seminal moment”, because it provided 
much-needed big-picture perspective. It also 
taught the trainee “the value to slowing down 
to speed up” — a lesson that the person now 
passes on. 

Perhaps Clynes’ approach is best summed 
up by one of his nominators. “Martin's biggest 
mentoring technique is his unwavering invest- 
ment in people,’ the person wrote. 

That is a fine mission statement for mentors, 
and one that would apply to many winners of 
the Nature competition over the years. m 


Philip Campbell is editor-in-chief of Nature. 


CLAIRE WELSH/NATURE (O'FARRELLY PHOTO FROM SARAH WHELEN) 


CAMBRIDGE COGNITION 


TURNING POINT 


Kate McAllister 


As a PhD student at the University of 
Cambridge, UK, Kate McAllister wrote 
articles, designed a neurology course for a lay 
audience and worked on videos and podcasts. 
This October, the clinical neuroscientist took 
home a Science Communication Award 
from the Society of Biology, a UK advocacy 
organization. 


What shaped your early-career aspirations? 

I avoided science during my undergraduate 
studies in psychology at the University of Glas- 
gow, UK, until the end of my degree, when a 
good teacher got me interested in biology and 
neuroscience. I did my master’s at the Univer- 
sity of Cambridge, working on mouse models 
of Huntington's disease, and spent three years 
asa research assistant in clinical neuroscience. 
Ijust wrapped up my PhD on mitochondrial 
function in people with Down's syndrome. 


When did science communication become 
important to you? 

During my time as a research assistant, I 
worked on Prader-Willi Syndrome, an inher- 
ited disease that often leads to obesity. I was 
asked to write for a newsletter that went out 
to families and patients. I'd always been inter- 
ested in writing, and explored opportunities 
with the university’s science magazine. As my 
interest grew, I came across a public-engage- 
ment training course called Rising Stars, 
funded by the Higher Education Funding 
Council for England. For the course, another 
trainee and I worked with a film-maker to 
create a short film called The Scanner, on using 
brain imaging to understand the syndrome, 
which won the Digital Revolution award at 
the Sheffield Doc/Fest in 2010. I got so much 
nice feedback, especially from patients’ fami- 
lies, that I realized science communication is 
important and hugely worthwhile. 


Describe your other communication pursuits. 
Ive found that once you do a bit of outreach, 
people ask you to do more. I helped to put 
together a course on neuroscience for lay 
people that proved popular. I also worked on 
a podcast for a radio show called The Naked 
Scientists. The British Film Institute also 
asked me to consult on a travelling live event 
focused on cognitive enhancement, which was 
an interesting combination of art and science. 


Were you ever discouraged from pursuing 
these interests? 

No. My PhD supervisor was very encourag- 
ing. He could see how, if] was interacting with 


lay people, it was important for me to broaden 
my communication experience, and that that 
would also help my interactions with study 
participants. I think support for these activities 
is very adviser-specific. The important thing 
is to show that it is a worthwhile endeavour 
and relevant to the group’s work. For example, 
my involvement in the documentary helped 
to bring attention to Prader-Willi Syndrome. 


Are these types of award important? 

Yes. Communication is becoming such an 
important part of our jobs as scientists, and 
with funding getting so much more competi- 
tive, you have to be able to talk to people about 
your science. You can't hide away any more. 


What do you plan to do next? 

I dont want to close the door on academia, 
but I started a job recently at a neuroscience 
start-up firm called Cambridge Cognition. 'm 
working as a scientist, but am also involved 
in academic collaborations. We use comput- 
erized touch-screen tests to assess different 
aspects of cognitive function. The results can 
be used, for example, by academics wanting to 
link cognitive function to different brain cir- 
cuits or by drug-makers who want to detect 
the cognitive effect of a candidate drug. My 
interest in science communication will con- 
tinue, but it will probably take a different form. 


How have your science-communication 
efforts influenced the way you work? 
Writing about other scientists’ work forces 
you to appreciate what others are doing and 
how your work fits into the bigger picture. As 
well, I’ve found collaborations I wouldnt have 
stumbled on otherwise. = 
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POSTDOCS 


Office poll 


The number of postdoc advice offices at US 
research institutions has ballooned to 167, 
up from around 25 in 2003, according to 
the US National Postdoctoral Association 
(NPA) in Washington DC. The NPA 
surveyed offices to learn about postdoc 
demographics, policies and compensation. 
Covering an estimated 79,000 postdocs, the 
offices coordinate services such as career 
guidance, training and visa information. 
But very few of the 74 institutions that 
completed the survey track career 
outcomes. Most worrisome, says NPA 
executive director Belinda Huang, is that 
70% of offices operate on $40,000 or less 

a year. “We're concerned about how small 
these budgets are for the numbers of 
postdocs they are serving,’ she says. 


EDUCATION 
Graduate feedback 


The US National Science Foundation 

in Arlington, Virginia, has launched an 
online forum to gather input about the 
future of graduate education. The impetus 
came from several years of reports from 
federal agencies and others that found 
that existing graduate programmes do not 
adequately prepare students for careers 
outside academia, says Ryan Bixenmann, 
part of the team that will maintain the 
discussion at nsfgradforum. wordpress. 
com. The forum will collect feedback 

on: mentoring, attracting women and 
minorities, preparing for jobs outside 
academia, building non-technical skills, 
and other issues. “We wanted input from 
the stakeholders,” Bixenmann says. 


STEREOTYPING 
PhD costume slammed 


A low-cut, crotch-length graduation gown 
with a mortarboard marketed as ‘Delicious 
Women’s PhD Darling Costume’ has been 
garnering ire and jokes since being offered 
on Amazon this Halloween. Almost 
two-thirds of around 350 reviewers give it 
the lowest possible rank. Carol Colatrella, 
who co-directs the Center for the Study of 
Women, Science, and Technology at the 
Georgia Institute of Technology in Atlanta, 
says that the gown sexualizes women. 
“This is a subtle way of digging at them 
and saying ‘you're just a woman or ‘you're 
a sexual object,” she says. Such outfits are 
not limited to costume suppliers; in 2012, a 
European Commission campaign to attract 
more girls to science was criticized in part 
for featuring similarly short skirts. 
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WHEN THE MUSIC ENDS 


BY PHILIP BALL 


Laid in Earth this morning, although 

I know it is a crime. When I cried, it 
wasn't because I felt ashamed. They were 
joyous tears; I was undone by beauty. 

But that’s not why I’m crying now, 
hammer in hand, shards of black 
shellac at my feet. It’s possible 
that no one now will hear Dido's 
lament ever again. I can't bear that 
thought, but what choice did I 
have? Now I see why music was 
so dangerous. 

Let me explain: yes, I can hear 
music. We do exist, the rumours 
are true. There are all kinds of rea- 
sons why some people evade the 
embryo screening, but I guess my par- 
ents’ motivations were the usual sort: a 
quirk of their genetic combination made 
it impossible to conceive a healthy child, to 
pass on their congenital amusia, and they 
couldnt afford the cost or risk of the precari- 
ous gene therapies we have now. It’s cheaper, 
in the end, to bribe the doctor. 

Of course, many of us musics never even 
discover our condition, not really. Perhaps 
we feel a weird thrill at the song of a night- 
ingale, even at the shallow prosody that, in 
spite of ourselves, speech still retains. But I 
have real music to listen to. 

You see, my great-grandfather ran a 
museum of music technology, and his col- 
lection of long-playing records survived the 
digital purge, that mass wiping of melodi- 
ous data. He knew it was dangerous but he 
couldn't help himself, he kept all these discs 
and an old hand-wound gramophone in this 
remote retreat in the hills. That's where I’ve 
spent the past month and more, cranking 
my way through Albeniz, Albert Ammons, 
Aerosmith — names whispered fearfully 
now, like a catalogue of medieval demons. 
It was when I got to Bach that I began fully to 
understand how perilous this stuff was. Yet 
only Purcell has tipped me into destruction. 
I feel I have betrayed my ancestors, but it’s 
either that or betray your children. 

My great-grandfather’s diaries give a 
truer picture than the official accounts. The 
problem, I now see, was when we found a 
way to explore musical space automati- 
cally. Ifit hadn't been for music-generating 
algorithms, wed have probably languished 
indefinitely in this harmless territory all 
around me, where music could do nothing 


[En to Henry Purcell’s When Iam 
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Criminal records. 


worse than make us weep, or laugh, or dance 
or recover the will to live. 

Of course, no one meant to develop such 
a lethal strain of music, even the approved 
histories will admit that. Those researchers 


had no idea that such a fatal realm of musi- 
cal space existed. But once they put emo- 
tionality into the fitness functions of their 
genetic algorithms, it was inevitable that the 
computer’s compositions would start drift- 
ing towards that place. The commercial sys- 
tems were crude explorers: users, craving 
the exquisite and blissful pang, could and 
did turn up the setting to full while only ever 
encroaching on the borders of the danger- 
ous terrain. No, it was in the laboratory that 
the advanced tools existed to carry the quest 
across the boundary. 

Inow see that those explorers were not, 
as we have been told, reckless fools. They 
couldn't have known or even suspected what 
lay in wait, their benign intentions untram- 
melled by a knowledge of music’s devastat- 
ing seductions. The better it got, the more 
vindicated they felt. There were no con- 
tainment procedures — why should there 
have been? Such a short step, in the end, 

from telling friends 


> NATURE.COM and colleagues “You 
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Ei go.nature.com/mtoodm = stupor that became 


2014 


© 2014 Macmillan Publishers Limited. All rights reserved 


the symptom of imminent succumbing. 
And who could ever have contained that 
digital virus, spreading through our invis- 
ible networks in just a few terrible weeks, 
depriving everyone beyond the age of 
infant-learned receptivity of their 
will to work, to eat, to remove their 
headphones even in the face of their 
deepest instincts for survival and 
progeniture? Like all viruses, it 
adapted itself to the local circum- 
stances: the deadly trance was 
soon induced by hyper-emotive 
gamelan, hymn, tribal chant, 
generated in inexhaustible sup- 
ply. The economy collapsed and 
people starved, for in the end 
music is not a food of any sort. 
Only the total amusics survived, 
and only those whose condition was 
congenital could hope to breed. Darwin 
would have understood, for he would have 
been among the survivors. 

Real dangers always beget taboos and 
then laws. Those decades after the disin- 
tegration saw such hardship and horror 
that it’s no wonder amusia is now a legal 
condition of carrying a fetus, even while it 
is a crime to possess instruments or record- 
ings one cannot use to any effect. But still 
we have emerged again, knowing we must 
keep our condition hidden. And even if the 
digital world is rigorously monitored for 
anything that might be considered musical 
(to the extent that amusics can judge), it has 
never been possible to eliminate all vestiges 
of humanity’s past passions. 

When I discovered this hoard, at first 
its contents made little auditory sense. But 
we are still pattern-seekers, and it didn’t 
take me long to hear, and finally to adore, 
music’s cognitive games. Every disc is a 
revelation: Couperin, Abbey Road, Judy 
Garland’s Over the Rainbow. I think of 
Bach sent floating in golden grooves a cen- 
tury ago towards other stars, and wonder: 
have we polluted the cosmos, or after all 
enriched it? 

But that Purcell. Finally, I saw beyond this 
rapture to something of such overwhelming 
beauty that I needed, in my fear, to shatter 
it. Now I stand here grasping the weapon, 
all these glittering black discs laid out before 
me. 


Philip Ball is an author. His latest book 
is Invisible: The Dangerous Allure of the 
Unseen (Bodley Head). 
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elanoma is the deadliest form of skin cancer and 
M strikes tens of thousands of people around the 

world each year. The number of cases is rising faster 
than any other type of solid cancer (see page S110). 

It is usually caused by too much exposure to the Sun’s 
ultraviolet radiation. But the link between sunshine and 
melanoma is not as straightforward as it seems. The pattern 
of exposure can be just as important as the total amount of 
ultraviolet radiation that reaches the skin (S112). 

Because the cause of melanoma is so well known, it seems 
strange that the incidence keeps rising. But although we have 
the tools to prevent the disease, we do not always use them 
(S117 and $126), and not enough people take action to reduce 
their risk. Australia, which has the highest rate of melanoma, 
has been slowly getting the disease under control and may 
have some lessons to teach the rest of the world (S114). 

For those hoping to skip the demands ofa sun-safe routine 
and simply take a sunscreen pill instead, the news is not so 
good. There is little evidence that any drug will be able to offer 
full sun protection (S124). 

For those who do develop melanoma, however, the chances 
of recovery are rising. Targeted treatments and therapies that 
use the body’s own immune system have been developed in 
the past few years (S118). 

Although melanoma is primarily an affliction of the fair- 
skinned, it can also strike those with a darker complexion. 
The disease in black populations seems to have a different 
biology to that in lighter-skinned people, and is also 
particularly deadly (S121). 

We are pleased to acknowledge that this Outlook was 
produced with support of a grant from Bristol-Myers Squibb. 
As always, Nature retains sole responsibility for all editorial 
content. 
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THE CANCER THAT RISES WITH THE SUN 


Melanoma is an aggressive cancer that normally starts in the skin. It can strike anyone but 
is most common in people with pale skin, and it is getting more common. By David Holmes. 


THE MARCH OF MELANOMA 


Melanoma is a cancer that starts in cells called melanocytes, which make 
the pigment melanin. It usually starts in a mole and is strongly linked with 
exposure to UVA and UVB radiation from the sun or sunbeds. However, it 
can occur in any tissue that contains melanocytes, such as the eye or the 
intestines. Genetic factors can also increase the risk of melanoma. At 
diagnosis, a melanoma has a numerical stage based on how deeply 

it has grown into the skin, and whether it has spread to other parts 
of the body. 


Melanin 


Melanocyte 


melanomas come back 
after surgery. 


spreading to lymph 
nodes or other parts of 
the body. It can be easily 
removed by surgery. 


sites. Most stage 2 
melanomas can still be 
treated with surgery. 


lymph nodes, or other 
areas of the skin. 
Treatment depends on 
which areas are affected. 
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INCREASING BURDEN 


The incidence of melanoma is increasing faster than that of any other solid tumour, although the mortality rate has remained largely flat. 


Figures shown are for the United States, where melanoma is now the fifth most common form of cancer. 


new cases of 
melanoma in 2014 


Deaths 


4.6% 1.7% 
of deaths from cancer 
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melanoma in 2014 
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SOURCES: Cancer Research UK/Surveillance, Epidemiology, and End Results Forum 
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GLOBAL INCIDENCE 


Melanoma is the 19th most common cancer worldwide, with around 232,000 new cases diagnosed in 2012, accounting for 2% all cancers. The highest rates of 
melanoma occur in countries where the inhabitants are predominantly light skinned. Northern Europe and North America have the highest incidence rates in the 
Northern Hemisphere, and Australia and New Zealand have the highest incidence in the south. The burden of melanoma in South America and Asia is relatively low. 


The highest 


incidence rates are 
found in New 
Zealand and 
Australia. The 
highest recorded 
incidence is in 
Queensland, 
Australia: 56 cases 


Cases per 100,000 people per year 
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United States incidence map Europe incidence map 
The highest rates of melanoma in the ; 
United States occur in the northwest and 
southeast states, reflecting the higher 
proportion of the population who are of 
non-Hispanic white ethnicity in those states. 


Switzerland has the highest incidence of 
melanoma in Europe, with 25.8 cases per 
100,000 people per year. Southern 
European populations have the lowest 
burden of melanoma. The incidence is 
highest in Northern Europe, particularly 


Cases per 100,000 people per year in nordic countries. 
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BEYOND THE PALE TIME IN THE SUN 


Anyone can get melanoma but it usually afflicts people with light skin, and it is more common in men than in women. In the United States, melanoma of the skin occurs most often in 
In the United States, it is more common among non-Hispanic whites than people of other races and ethnicities. people aged between 55 and 64. 
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The timing and pattern of exposure to the sun can alter the chance of someone developing melanoma. 


RISK FACTORS 


Riddle of the rays 


Spending time in the sunis a major risk factor for melanoma, but the relationship is not as 


straightforward as it seems. 


BY CASSANDRA WILLYARD 


bombarded with the message that the 
single biggest risk factor for skin cancer 
is spending time in the sun, and that limiting 
their exposure is the best way to stay safe. In 
Australia, a cartoon seagull advised people 
to slip on a shirt, slop on some sunscreen, 
and slap on a hat. In Dubai, one advertising 
agency distributed coffin-shaped beach towels 
printed with the words: “Over-exposure to the 
sun causes skin cancer killing 20 people every 
day.” And posters in Canada proclaim: “No 
tan is worth dying for!” But although the link 
between sun exposure and melanoma is clear, 
it is far from straightforward. 
Consider, for example, Merideth Cooper, a 
24-year-old graduate student who discovered 
a suspicious mole on her back while shopping 


Peer all around the world have been 


for bras. A week later she went to the doctor 
to have the mole removed, along with another 
suspicious mark on her thigh. Both turned out 
to be melanomas. But the diagnosis did not 
seem to make sense. Cooper had been to the 
tanning salon a few times but wasn't a regu- 
lar user. And she had been sunbathing during 
the spring break, but she was not one of those 
girls who spent her summers lying in the sun. 
“I know people who are out in the sun way 
more,’ she says. 

The damage that triggers melanoma often 
starts with the absorption of ultraviolet radia- 
tion, so it makes sense that more sun would 
confer more risk. But that is not always true. 
The timing and pattern of exposure are also 
crucial. Furthermore, some individuals are 
more susceptible than others. “When you 
put all those factors into the mix, it can make 
a complicated story,’ says David Whiteman, 
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a melanoma researcher at QIMR Berghofer 
Medical Research Institute in Herston, Aus- 
tralia. While Whiteman and other epidemi- 
ologists try to make sense of this complexity, 
some researchers are exploring the role of 
other environmental risk factors. 


SPORADIC EXPOSURE 

Melanoma begins in melanocytes, the cells that 
give skin its colour. These cells contain a pig- 
ment called melanin, which absorbs damaging 
ultraviolet rays from the sun. Exposure to the 
sun drives most forms of the disease, but the 
connection is complicated. “Melanoma is not 
one disease, it’s a collection of diseases,” says 
Martin Weinstock, a dermatologist at Brown 
University in Providence, Rhode Island, and 
they have different risk factors. For example, 
the rare melanomas that arise on the palms of 
the hands or the soles of the feet, on mucous 


SOHO/ESA/NASA 


membranes, or under fingernails and toenails, 
don't seem to be linked to ultraviolet exposure 
(see page $121). 

But even for the most common forms of the 
disease for which sun exposure is a known risk 
factor, the data can be confusing. “You might 
expect that if you work in the sun all day, if 
you're a gardener or something, that you might 
have particularly high rates of melanoma,’ says 
Anne Cust, an epidemiologist at the Univer- 
sity of Sydney in Australia. “But that doesn't 
seem to be the case.” Indeed, some studies have 
found that outdoor workers actually have a 
lower risk of developing melanoma than those 
who work indoors’”. 

Instead, the greatest risk seems to come from 
intermittent sun exposure and sunburn, and 
the use of sunbeds. One Canadian study, for 
example, found a strong link between activi- 
ties associated with intermittent exposure, 
such as beach vacations, and an increased risk 
of melanoma’. 

Researchers are still trying to tease out why 
that might be. One idea is that skin exposed 
continuously to sunlight adapts and becomes 
better at repairing the DNA damage caused 
by ultraviolet radiation. Another idea is that 

the increased pro- 


“You might duction of melanin 
expect that if might form a protec- 
you work in tive shield against the 
thesunallday —_—‘barmful rays. 
you might have But a more con- 
high rates of troversial hypothesis 
melanoma. But involves vitamin D. 
that dAdcen'é Sunlight helps the 
tobeth body to synthesize 
pil a its own vitamin D, 


and some researchers 
think that people 
who spend a lot of time outdoors might be pro- 
tected from developing melanoma by having 
higher levels of the vitamin. But the evidence 
is limited and the causality is ambiguous. “We 
still haven't decided whether vitamin D is the 
result of good health, or whether it leads to 
good health,’ says Marianne Berwick, a can- 
cer epidemiologist at the University of New 
Mexico in Albuquerque. 


TWO ROADS DIVERGE 

Furthermore, not every study has found a 
strong link between intermittent sun and 
melanoma. Whiteman thinks this is because 
intermittent exposure is only part of the story. 
Over the past decade, he has been analysing 
where and when melanomas occur, and he 
has found additional nuances. For example, 
chronic exposure does seem to be a risk fac- 
tor, but only for certain people. Outdoor work- 
ers tend to get their melanomas on exposed 
areas of skin — the face, ears, neck and scalp 
— when they are in their 70s and 80s. People 
who develop the disease earlier in life tend to 
have had more episodes of acute sun exposure 
early in life, he says. In this group, melanoma 


tends to occur in parts of the body that are only 
occasionally exposed to the sun, such as the 
back, abdomen, upper legs and arms. 

Whiteman argues that these differences are 
at least partly due to differences in people’s 
propensity to develop moles. It makes sense 
that a greater tendency to develop moles may 
indicate the presence of melanocytes that read- 
ily proliferate. Indeed, individuals who have 
more moles have a higher risk of melanoma. 
In these people, Whiteman says, short bursts of 
intense sunlight early in life might be enough 
to kickstart the molecular events that lead to 
the cancer. Melanocytes are still maturing in 
young people, and those on the trunk seem 
to mature more slowly. In people who do not 
tend to develop moles, however, the process 
might require more prolonged sun exposure. 
Whiteman calls this hypothesis the ‘divergent 
pathways’ model. 

In 2003, Whiteman attempted to test this 
model. He compared people who developed 
malignant melanomas on their trunks with 
people who had them on their heads and 
necks. Almost everyone in the study had at 
least one mole, but those with melanomas of 
the head and neck tended to have fewer moles 
than those who developed melanomas on their 
trunk. They also reported greater occupational 
sun exposure’. A handful of other studies have 
reported similar results (see ref. 5, for example). 

Whiteman is still refining his theory. “Ini- 
tially, our model was that there are two path- 
ways,’ he says. But molecular investigations 
suggest that there are more than that, and that 
different patterns of sun exposure damage dif- 
ferent genes. “As we combine our knowledge of 
molecular science with epidemiology, we can 
start to untangle these pathways a bit more 
clearly,’ he says. 


BEYOND THE SUN 

We know that sunburn — a marker of inter- 
mittent exposure — seems to roughly double 
an individual's risk of developing melanoma. 
But we don't know whether other environmen- 
tal factors play a role too. “You would think 
that ifthe sun were the only cause it would be 
much stronger, as in cigarette smoking and 
lung cancer,’ Berwick says. (Smokers are 15-30 
times more likely to develop lung cancer than 
those who do not.) 

Studies in the 1980s and 1990s examined 
the relationship between people’s workplace 
and their risk of developing different types of 
cancer. Some studies found a potential link 
between melanoma and organic chlorine 
compounds, a class of chemicals that includes 
PCB, an industrial chemical that was banned 
decades ago. 

Richard Gallagher, an epidemiologist who 
studies cancer risk at the BC Cancer Agency in 
Vancouver, Canada, decided to revisit the link 
using existing data and blood samples. He and 
his colleagues found that those with the high- 
est levels of PCBs in their blood had a sixfold 
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greater risk of melanoma than those with the 
lowest concentrations®. Gallagher is working 
ona larger study to see if the association holds, 
but the link to PCBs seems plausible. “They 
can produce reactive oxygen species, and per- 
haps that renders people more susceptible to 
other factors,’ he says. Although PCBs are no 
longer sold, they are still found in the environ- 
ment, with fish in particular containing high 
levels of the pollutant. 

Frank Meyskens, an oncologist at the Uni- 
versity of California, Irvine, thinks there may 
be another culprit: heavy metals, especially 
chromium. He became suspicious when he 
read that melanoma is unusually common 
in patients who have metal-on-metal hip 
replacements composed of alloys that contain 
cobalt and chromium. The US Food and Drug 
Administration warns that when the ball and 
cup of these hips slide against each other, they 
can release metal particles, some of which end 
up in the bloodstream. When Meyskens and 
his colleagues incubated melanocytes in the 
presence of a variety of metals, they found 
that cells exposed to chromium changed their 
shape and developed chromosomal abnormal- 
ities’, supporting the idea that these metals can 
cause skin cancer. 

Certain medications have also been impli- 
cated. This summer, a team of researchers from 
Harvard University in Boston, Massachusetts, 
found a link between malignant melanoma 
and sildenafil citrate (Viagra). The study fol- 
lowed nearly 26,000 men over 10 years. Men 
who had taken the drug were twice as likely to 
develop melanoma as those who did not. The 
drug inhibits a molecule called PDESA, and 
the team speculates that this might promote 
the invasion of the primary tumour’. 

Other environmental factors might provoke 
the disease too. Cooper, who is now free of can- 
cer, will never know the exact cascade of events 
that sparked her melanoma. But now that she 
has had the disease, she has an increased risk of 
recurrence, so she takes precautions. When she 
is out in the sun, she always wears hats and uses 
sunscreen. She keeps an inventory of her moles 
and is constantly looking out for changes. “I 
notice everything now,’ she says. “You have to 
almost be that cautious because you have to 
catch them early. = 


Cassandra Willyard is a freelance writer 
based in Madison, Wisconsin. 
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The rewards of sunbathing can be immediate but the melanoma risk may seem distant and intangible. 


Lessons from a 
sunburnt country 


Countries that can’t persuade people to stay safe in the sun 
could learn from Australia, melanoma capital of the world. 


BY ZOE CORBYN 


efore she leaves home in San Francisco, 
B California, Jennifer Schaefer dons long 
sleeves and a big hat she calls her “per- 
sonal umbrella”. With her fair skin, red hair, 
memories of bad childhood sunburn, and 


a family history of skin cancer, Schaefer is 
painfully aware of the dangers of exposure to 


ultraviolet radiation, which accounts for the 
vast majority of skin cancers. 

So she finds it mind-boggling how few peo- 
ple bother with sun safety, with most preferring 
sun worship to sun protection. “In our culture, 
it’s almost funny to be too sun protected,” she 
says, highlighting the way her friends tease her 
when she dons her bathing suit — a protective 
‘rash guard’ top and knee-length board shorts. 
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“We're slowly starting to become aware of the 
long-term effects of the sun, but it’s like global 
warming — people are not going to make seri- 
ous changes until they feel a direct impact.” 

That impact has helped push Australians, 
who are famous for sun loving, into changing 
their behaviour. With its high solar ultraviolet 
levels and predominantly fair-skinned popula- 
tion, Australia has the highest rate of skin can- 
cer in the world. But after decades of increase, 
the melanoma rate began to plateau in the mid 
1990s’. The incidence of melanoma among 
young people is now falling’”, as national sur- 
veys show that most Australians — more than 
70% of adults and 55% of adolescents — no 
longer prefer a tan’. 


SLIP! SLOP! SLAP! 

One reason for the change is that Australia 
essentially hit saturation point, says Adéle 
Green, a cancer epidemiologist at the QIMR 
Berghofer Medical Research Institute in Bris- 
bane. Melanoma was so common that most 
people knew someone who had suffered from 
it, so the need to act was obvious. There has 
also been an ongoing skin-cancer awareness 
campaign to educate the public*” that started 
in the early 1980s with the well-known ‘Slip! 
Slop! Slap! television commercial, in which 
an animated seagull told Australians how to 
stay safe in the sun. The SunSmart programme 
today combines mass media campaigns and 
intensive work with schools, workplaces, local 
government, health professionals, parents and 
sports groups. Operating under the control of 
charities called cancer councils, with funding 
from state governments, the SunSmart pro- 
gramme has made Australia a world leader in 
preventing skin cancer. 

When Green was growing up, annual sun- 
burn for children was “just a fact of life’, she 
says. As a teenager, she and her friends cooked 
themselves “like bacon and eggs” in suntan oil. 
Melanoma rates are still increasing among older 
people’ because damage done early in life can 
trigger malignancy decades later. But Green 
believes there has been a national change 
in mindset. “Generations born since ‘Slip! 
Slop! Slap? have known nothing but a culture 
imbued with sun protection messages,’ she says. 

Many other countries struggling to get 
their populations to make sun protection part 
of daily life would love a little of Australia’s 
magic. In July 2014, the US surgeon general 
issued a ‘call to action’ (go.nature.com/zy27zl) 
asking all sectors of society to come together 
to reduce exposure to ultraviolet radiation. 
“One of the reasons we put this report out is 
to do what Australia did years ago,’ says Boris 
Lushniak, acting US surgeon general. The 
report details increasing rates of skin cancer 
and says most people are not doing enough to 
protect themselves from the sun. One in three 
adults has had sunburn in the past year, it says. 
It also points to the high use of sunbeds by 
young white women, with nearly one in three 
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engaging in the practice each year (see ‘Ban- 
ning indoor tanning’). 

“We have increased knowledge but there is 
not a lot of evidence for changing behaviour, 
says Joel Hillhouse, a psychologist who directs 
the Skin Cancer Prevention Laboratory at East 
Tennessee State University in Johnson City. So 
why aren't people in the United States and else- 
where heeding the messages? What lessons can 
be learned from Australia? 

One powerful obstacle to people protecting 
their skin properly is our culture's view that a 
tan is attractive and healthy. “The social per- 
ception that tans are beautiful is a barrier we 
still as a society haven't overcome,’ says Eleni 
Linos, a dermatologist who studies skin-can- 
cer prevention at the University of California, 
San Francisco. Perpetuating this notion, says 
Hillhouse, is a multibillion-dollar tanning 
industry. 

Then there is the nuisance factor: protect- 
ing skin requires steps such as remembering a 
hat or applying sunscreen that can seem more 
trouble than they’re worth®. The risk—-reward 
balance works against sun protection in many 
people’s minds, says Carolyn Heckman, a 
psychologist specializing in skin-cancer pre- 
vention at the Fox Chase Cancer Center in 
Philadelphia, Pennsylvania. The risk of skin 
cancer can seem minor, distant and intangi- 
ble. By contrast, tanning can provide instant 
gratification. 

But there is nothing immutable about peo- 
ple’s affinity for the sun. Indeed, until the early 
1900s, pallor was popular in Europe and North 
America because it indicated an upper-class 


lifestyle and an occu- 

“We have pation that did not 
nomics entail outdoor labour 
knowledge but (this idea still prevails 
Wanteanie in many Asian coun- 
fi tries). Then in the 

lot of evidence 1920s doctors began 
for changing prescribing sunbath- 


e »” 
behaviour. ing as medication 


for ailments such as 
tuberculosis. Many people credit French style 
icon Coco Chanel with making the tan chic 
by bronzing herself on a yacht in the Mediter- 
ranean. By the 1960s the bikini had arrived, 
and tanning beds further increased the popu- 
lation’s exposure to ultraviolet radiation. 

But our love of the sun is more than just cul- 
tural. Our biology makes it hard to stay away 
too. Frequent sunbathers and indoor tanners 
can exhibit symptoms of addiction. Mice 
exposed to a daily dose of ultraviolet radiation 
develop higher levels of the feelgood hormone 
B-endorphin within a week, and exhibit clas- 
sic symptoms of withdrawal when the endor- 
phin rush is blocked’. This effect may explain 
why it feels good to go out ona sunny day, says 
David Fisher, director of melanoma research at 
Harvard University in Cambridge, Massachu- 
setts, who led the mouse study. He believes it 
could be a relic of our evolution, dating back to 
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Australian primary schools typically provide plenty of shade and encourage children to wear sun hats. 


when being outside in the sun could have con- 
ferred health benefits and even saved lives by 
triggering the skin to synthesize the vitamin D 
required for strong bones. 

“Per exposure, the power of the euphoric 
effect is pretty small, Fisher says. “But if people 
have just a modestly increased propensity to 
seek ultraviolet radiation, over a population of 
millions you have an increase in skin cancer.” 
Recognizing the addictive effect, he believes, 
could aid public-health efforts. For example, he 
argues that regulatory agencies should take a 
tougher stance with young people on sunbeds 
because of the possibility of dependence. And 
public-health messages could be enhanced 
by explaining to people that our physiology 
means we have less control than we think. “It 
might allow people to step back and look more 
objectively at their behaviour,’ he says. 


MIXED MESSAGES 
Inconsistent public-health messages may also 
be hampering behavioural change. In 2012, 
DeAnn Lazovich, a cancer epidemiologist at 
the University of Minnesota in Minneapolis, 
compared the recommendations to prevent 
skin cancer from four US health bodies*. They 
sometimes had different messages and ranked 
the order of protective actions differently. 
“Anyone trying to figure out what they ought 
to do might be a little bit confused,” she says. 
Linos is worried by a general overemphasis 
on sunscreen, the most common protective 


measure people take. Her research shows that 
sunscreen users get sunburn more frequently 
than those who seek shade or wear protective 
clothing’. Although people may be more likely 
to apply sunscreen before prolonged exposure 
to the sun, she acknowledges, they often fail to 
apply it thickly enough to be effective. It can 
also lull users into a false sense of security. “Peo- 
ple feel they can stay out longer,’ she explains. 

Contradictory information about vitamin D 
has added to the confusion, says Martin Wein- 
stock, a dermatologist and community-health 
researcher at Brown University in Providence, 
Rhode Island. There have been suggestions 
that vitamin D can help prevent everything 
from cancer to diabetes (although a 2010 
Institute of Medicine report found insufficient 
evidence for any beneficial effect beyond bone 
health), and the tanning industry has seized 
on this, says Weinstock. So the public hears 
warnings about the need for sun protection 
juxtaposed with messages about the benefits 
of vitamin D. “It doesn’t take much contradic- 
tory messaging to really screw up the whole 
enterprise,” Weinstock says. 

Different countries resolve this conflict in 
different ways. The United States has encour- 
aged people to protect themselves from ultra- 
violet radiation and to get any additional 
vitamin D they need from supplements. But 
Australia advises people that they may need 
to seek sun exposure to ensure adequate vita- 
min D levels, which they can do safely by going 
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outdoors without sun protection at times of 
day when ultraviolet levels are low. This ‘do no 
harny approach is balanced and realistic, says 
Craig Sinclair, who heads prevention at Cancer 
Council Victoria in Australia. But Weinstock 
disagrees, arguing that there is no guaranteed 
safe level of ultraviolet exposure. “A little bit of 
sun is not going to do you a lot of harm, but it 
will do you alittle bit of harm, he says. 

What's more, public-health messages haven't 
always been well designed for the demographic 
groups they are intended to target. Hillhouse 
has studied what motivates young women who 
use sunbeds to change their behaviour, and it 
has little to do with their health’®. “A young 
person's view of skin cancer is that it is just so 
far off; he says. It’s better to focus messages 
on something they care deeply about: their 
appearance. For young women, Hillhouse 
advocates stressing the link between ultraviolet 
exposure and wrinkles and, importantly, sug- 
gesting safe alternatives to achieve a socially 
desirable appearance, such as exercise. “Public 
health tends to take an almost religious view — 
you just tell people what is going to make them 
healthier and they will do it,’ he says. But that 
approach is flawed, Hillhouse explains. “Psy- 
chology says we need to work with the person 
in ways that matter to them” 


AUTOMATIC FOR THE PEOPLE 

One lesson Australia can teach other countries, 
says Sinclair, is that prevention campaigns 
require sustained resources. “Every time we 
take our foot off the pedal and reduce our 
investment, we get a regression in behaviour,” 
he says. 

Indeed, funding for prevention campaigns 
in the United States has only ever been spo- 
radic — there has never been a serious national 
campaign. “The resources we have put into 
stopping smoking, drunk driving or AIDS 
have never been put into skin cancer,’ says 
Hillhouse. In the United Kingdom, where ris- 
ing skin-cancer rates are thought to be driven 
by the popularity of cheap overseas travel and 
indoor tanning™’, the charity Cancer Research 
UK has run a prevention campaign for the past 
decade. It is based on Australia’s SunSmart 
brand but the investment has only been “very 
small” in comparison, says Sinclair. Yet pre- 
vention provides value for money by reducing 
expensive treatment costs: every Aus$1 spent 
on SunSmart in Australia delivers a net saving 
of $2.30 (ref. 12). 

Another important lesson — also apparent 
from anti-smoking campaigns — is that an 
educational component alone is not enough. 
Mass media campaigns targeted at changing 
individuals’ behaviour have to be backed by 
policies and legislation. “Just personal choice 
is not going to do it,’ says Green. Australian 
primary schools, for example, have adopted ‘no 
hat, play in the shade’ policies, and also have 
commitments to provide sufficient shade in 
school grounds. Sunscreen is available in 


BANNING INDOOR TANNING 


The campaign against sunbeds 


It is hard to overstate Clare Oliver’s role in 
Australia’s campaign against sunbeds. She 
was a 26-year-old journalist who died of 
melanoma in late 2007, but she devoted 
the last month of her life to publicizing 
the dangers of indoor tanning, which she 
blamed for her melanoma. The media 
frenzy that followed her appearance on 
television led the state of Victoria to become 
the first in Australia to announce it would 
ban people younger than 18 from using 
commercial tanning beds. Other states 
soon followed, but what Oliver started 
didn’t stop there — at the end of 2014, 
all Australian states will ban commercial 
indoor tanning completely. Australia will be 
the second country after Brazil, which took 
action in 2009, to have imposed such a ban. 
The World Health Organization classified 
sunbeds as carcinogenic in 2009. 

Many European countries have also 
legislated to ban access to sunbeds for 
minors, including the United Kingdom 


classrooms, and sun protection is taught to 
children of all ages. By contrast, many US pri- 
mary schools ban hats on the school grounds 
(partly to discourage cliques) and only allow 
sunscreen to be dispensed by a school nurse. 
“We would like those students to be allowed 
to use proper sun protection,” says Lushniak. 

Australia has succeeded, says Linos, because 
it has coupled its educational campaign with 
efforts to make it easy to use sun protection. “If 
you make it automatically part of daily life it is 
much easier,’ she says. It takes less effort to stay 
in the shade where there is plenty available, to 
pay attention to the ultraviolet index when it 
is part of the weather forecast, and to persuade 
children to wear hats when they are used to 
wearing one at school. 

Meanwhile there is some cause for optimism 
outside Australia. Attitudes have started to 
change. Hillhouse says he has unpublished US 
data showing that a mild, rather than dark or 
moderate, tan is now preferred. In his study, 
participants sought “just enough tan to take 
away the pale look” And analysis of American 
women’s fashion magazines over several dec- 
ades shows that models are not as tanned as 
they used to be”’. 

A 2013 study shows that, in addition to Aus- 
tralia, a handful of countries — notably New 
Zealand, Canada, Israel, Norway, the Czech 
Republic (for women) and the United States 
(for white men) — have melanoma rates that 
are declining or stabilizing among young peo- 
ple’. “Very slowly we seem to be turning the 
tide,’ says Green. 

Researchers say the US surgeon general’s call 
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(Scotland in 2009, England and Wales in 
2011, and Northern Ireland in 2012). The 
ban couldn’t come soon enough. It is well 
established that melanoma incidence is 
lower in the north of England than in the 
sunnier south, but the high prevalence of 
indoor tanning among young women in the 
north of England is thought to be one reason 
why they buck the trend!!. 

Eleven US states, led by California in 
2011, have prohibited indoor tanning for 
those under the age of 18 (others have 
weaker restrictions and 10 states have 
none at all). In May 2014, the US Food and 
Drug Administration reclassified tanning 
beds from low risk (class I) to moderate risk 
(class II) , and it now requires manufacturers 
to include a warning advising against their 
use for people younger than 18. “Society 
makes the decisions,” says Boris Lushniak, 
the acting US surgeon general. “But this is 
needless exposure to ultraviolet radiation, 

a known carcinogen.” Z.C. 


to action will need to be backed by funding 
to have the greatest effect, but they hail it as 
a step in the right direction. Sun safety “has 
been elevated to a public-health priority now’, 
says Lazovich. “It gives groups something to 
get behind,’ adds Weinstock. 

Back in San Francisco, Jennifer Schaefer is 
doing her best to educate the next generation. 
Her eldest daughter automatically puts ona hat 
to go outside. “Habits really start in childhood 
— itis like brushing your teeth,” she says. m 


Zoé Corbyn is a freelance journalist based in 
San Francisco, California. 
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PERSPECTIVE 


elanomas can be treated most effectively if they are caught 
M early when they are thinner. The best way to make sure this 

happens is to have a doctor or other health-care provider 
perform skin examinations, rather than to rely solely on the patient. 

However, in 2009, a lack of clinical-trial data on the effect of screen- 
ing on melanoma mortality left the US Preventive Services Task Force 
(USPSTF) unable to recommend routine skin-cancer screening of the 
general population by primary-care doctors. The USPSTF pointed 
out that the harms of such screening — such as physical and psycho- 
logical effects related to misdiagnosis, overtreatment and unnecessary 
biopsies — had not been adequately addressed. 

Since then, however, evidence for improved outcomes following 
skin screening has mounted. A population-based study of the resi- 
dents of Queensland, Australia, with first primary 
invasive melanoma (which invades the deeper 
layers of the skin) showed a 40% lower risk of 
being diagnosed with thick (=3 mm) melanoma 
if a skin exam was performed in the three years 
before diagnosis’, resulting in a predicted 26% 
fewer melanoma deaths over five years. 

An employee education and screening pro- 
gramme at the Lawrence Livermore National 
Laboratory from 1984 to 1996 was associated with 
a nearly 70% reduction in thick melanoma diag- 
nosis and significantly fewer melanoma deaths in 
the workforce than expected according to Cali- 
fornia mortality data’. A subsequent multicentre 
observational study of 566 US adults with invasive 
melanoma found that patients who underwent a 
full-body skin examination by a physician in the 
year before diagnosis were twice as likely to have 
a thinner (<1 mm) melanoma’. Men over the age of 60 benefited even 
more, with four times the odds of having a thinner tumour. 


ROUTINE CHECKS 

The most compelling population-based data are from a skin screen- 
ing programme in the German state of Schleswig Holstein in which 
almost 20% of the adult population over the age of 20 — more than 
360,000 people — were screened during a one-year period in 2003 and 
2004. Five years later, melanoma mortality had declined by nearly 50% 
compared with surrounding states*. The results convinced Germany 
to roll out the programme nationwide to all adults aged 35 and older 
in 2008. So far, nearly 30 million screenings have been done, and data 
on the programme’ effectiveness should soon be available. 

These studies suggest that routine skin examination by primary- 
care doctors may be a practical strategy for reducing mortality from 
skin cancer. The USPSTF is reconsidering its recommendations and 
calling for a systematic review of current screening practices. 

But for now, routine skin examination is far from the norm in the 
United States. Only 8-21% of people receive an annual skin exam 
from their doctor, even though primary-care physicians find more 
melanomas than do dermatologists. Americans make 1.7 visits to the 
doctor each year on average, and elderly people, who are at greatest 


THE GROWING 
BODY OF 


EVIDENCE 
SEEMS TO TIP THE 
SCALES IN FAVOUR 

OF USING 


SCREENING 


BY PHYSICIANS. 


Catch melanoma early 


The United States and other nations should follow Germany in 
routine skin screening, say Susan M. Swetter and Alan C. Geller. 


risk of fatal melanoma, make many more. So primary-care providers 
could be an important source of skin-cancer diagnosis and triage. 

It should be possible to incorporate screening into the primary-care 
workflow. It would take a trained physician only a few minutes, as part 
of a routine physical exam, and could reveal melanomas in high-risk 
areas not easily viewed by the patient, such as the back. Not all doctors 
are trained to identify early skin cancer, however. A 1.5-hour, web- 
based scheme called INFORMED (Internet Curriculum For Melanoma 
Early Detection) provides training and clinical guidance for the early 
detection of melanoma and other common skin cancers by primary- 
care providers. Preliminary data from the two integrated health-care 
systems that have used INFORMED suggest that it improved the abil- 
ity of doctors to recognize both benign and malignant skin lesions, 
and that it also decreased dermatology referrals, 
particularly to assess benign skin lesions. 

Implementing widespread skin screening 
requires a shift in the way that primary care is 
delivered, however, as routine physical exami- 
nations are becoming less common. In the 
present atmosphere of cost-cutting, recom- 
mendations from the USPSTF and greater con- 
sensus from other organizations are crucial to 
ensure that patients receive appropriate screen- 
ing for melanoma. In the interest of reducing 
deaths from melanoma, the USPSTF should 
consider all the recent data from worldwide 
screening efforts. 

The growing body of evidence seems to tip the 
scales in favour of using screening by physicians 
for melanoma, but there are questions over how 
to do it. Who should perform, receive and pay 
for the screens? Training ancillary health-care providers (such as nurse 
practitioners and physician assistants) could be beneficial, as well as 
compensating for carrying out full-body skin exams during routine 
medical visits. Preliminary data from Germany suggest that screening 
can save lives, but other studies are needed to understand the possible 
harms ofskin screening, along with potential cost savings for the health 
system. These will vary from country to country but must be under- 
stood if skin screening is to be widely incorporated into primary care. m 
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MELANOMA 


Zelboraf (vemurafenib! 


DRUG DEVELOPMENT 


A chance of survival 


People with advanced melanoma are living longer thanks to treatments that target 
cancerous cells or encourage the immune system to wipe out the tumour. 


BY HANNAH HOAG 


hen Antoni Ribas began treating 
metastatic melanoma 15 years ago, 
he faced a lot of difficult conversa- 


tions with his patients. Few treatments were 
available for those in the advanced stages of 
the disease, and none was particularly effec- 
tive. Patients with stage IV melanoma, which 
has spread to the lymph nodes or other organs, 


had a median survival of just 8-9 months, and 
only 15% lived for more than 3 years’. 

“T would sit down in front of them and dis- 
cuss treatments that might work for 10% of 
them at most,’ says Ribas, a medical oncologist 
at the University of California, Los Angeles. 
“And Id say, it probably won't make a differ- 
ence if we do treatment or not.” 

But things have started to change in mela- 
noma care. Since 2011, the US Food and Drug 
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Administration (FDA) has approved seven 
treatments for advanced melanoma (see 
“Treatment of BRAF-mutant melanoma), 
including one in September that promotes 
an immune response against the cancer, and 
several more are working their way through 
the process. Drug companies have dozens of 
treatments in clinical trials. 

Targeted therapies, which are tailored to a 
patient's genetic make-up and are designed to 


SUSAN BURGHART 


disable the cancerous cells, have become the 
cornerstone for the treatment of advanced 
melanoma. And drugs that target the immune 
system and enhance its ability to wipe out can- 
cer cells have just entered the clinic. Patients 
who had once failed to respond to the meagre 
range of available drugs are now showing 
strong, long-lasting responses. “It is an amaz- 
ing thing,” says Ribas. 


HITTING THE TARGET 

For many years, cancer was treated according 
to the organ in which it developed, or by bom- 
barding it with chemicals that killed off rapidly 
dividing cells. But then researchers began dis- 
covering the genetic mutations that transform 
anormal cell into a cancerous one. These find- 
ings uncovered mutant proteins that could be 
blocked by new drugs, allowing oncologists to 
selectively target the tumour. 

In the late 1990s, oncologists were excited 
about a new drug called imatinib (Gleevec) 
that homed in on the cancer cells of patients 
with chronic myelogenous leukaemia (CML). 
Most of these patients have an abnormal gene 
rearrangement that produces a protein that 
drives the cancer. In theory, drugs that target 
this protein should cause the cancer to retreat. 

This approach was not limited to leukae- 
mia. Another targeted therapy, Herceptin, was 
shrinking tumours in 
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cancer researchers 
looking for similar mutations that push cells 
to develop into melanoma. 

In 2002, researchers working on the Cancer 
Genome Project at the Wellcome Trust Sanger 
Institute near Cambridge, UK, uncovered one 
of melanoma’s weak points. They found that 
two-thirds of melanomas have a tiny change 
in the gene encoding a protein called BRAF 
that is part of a signalling pathway in the cell. 
The mutation changes one amino acid in the 
protein’, altering the pathway so that the cells 
multiply without limit*. “When I first saw that 
paper, it stopped me in my tracks,” says Keith 
Flaherty, an oncologist at Massachusetts Gen- 
eral Hospital in Boston. Identifying the role 
of BRAF made it possible for the first time to 
develop “a treatment concept for melanoma’, 
he says. 

But it would be years before a promising 
drug became available. Jeff Sosman, a medical 
oncologist at Vanderbilt University Medical 
Center in Nashville, Tennessee, explains: “Until 
2008, we honestly didn’t know if BRAF was 
targetable, and if by inhibiting this enzyme we 
would have an effective therapy,’ he says. That 
year, clinics began testing a drug called vemu- 
rafenib (Zelboraf), which targets the mutant 
BRAE About half of patients with advanced 
melanoma have a mutation in this protein, 
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Before 2011, few treatments were available for patients with advanced melanoma. Drugs gave a median 
survival of 8-9 months and only 15% of people lived for more than 3 years. But in the past four years, the 
US Food and Drug Administration has approved seven treatments that target the cancerous cells or trigger 


the immune system to do so, extending patients’ lives. 
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known as BRAF (V600E), and vemurafenib 
was their first chance at personalized medicine. 

The results exceeded all expectations. 
Tumours regressed rapidly and some patients 
improved overnight. In 2010, a small phase I 
trial of vemurafenib showed complete or par- 
tial tumour regression in 26 of the 32 patients”. 
The response was greater than anything pre- 
viously seen with advanced melanoma’. Ina 
phase III study, Paul Chapman, a specialist in 
metastatic melanoma at the Memorial Sloan 
Kettering Cancer Center in New York, showed 
that after three months of vemurafenib ther- 
apy, patients with the BRAF (V600E) mutation 
were 74% less likely to die or see their cancer 
worsen than patients who received a standard 
chemotherapy agent®. And 48% of them saw 
the growth of their tumours shrink or stop. 

The FDA fast-tracked the approval of 
vemurafenib for use in people with the BRAF 
(V600E) mutation in 2011, less than four 
months after it was submitted. A second BRAF 
inhibitor, called dabrafenib (Tafinlar), was 
given FDA approval in 2013. 


FACING RESISTANCE 

But cancer is a wily foe. Tumour cells mutate, 
and when a pathway is blocked, they find 
another route. So targeted therapies quickly 
lose their effectiveness, and many people who 
took vemurafenib found that resistance devel- 
oped within six months. The tumours, which 
had once melted away, grew back with new 
mutations that were impervious to the drug’. 

Other proteins in the same signalling path- 
way quickly became targets for drug discovery. 
BRAF inhibitors block the MAPK pathway, 
and scientists soon realized that most of the 
resistance comes from reactivation of the path- 
way through mutations in other genes that play 
a part in it’. The identification of these genes 
led to the development of more drugs that tar- 
get the pathway, including MEK inhibitors, 
such as trametinib (Mekinist), which became 
the second major player in the treatment of 
advanced melanoma. 

Oncologists then combined anti-BRAF and 
anti- MEK drugs with the aim of preventing the 
development of resistance. With the pathway 
effectively blocked at two points, the tumour 
cells struggled to develop new mutations. In 
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a small trial of the two drugs, Ribas and col- 
leagues found that more than 85% of patients 
with a BRAF (V600) mutation who had never 
received a BRAF inhibitor responded to the 
combination of drugs, compared with only 
15% of those who had developed BRAF resist- 
ance during an earlier treatment’. Patients 
who had never taken a BRAF inhibitor lived 
for 13.7 months before the disease progressed, 
compared with 2.8 months for those who 
had previously developed resistance to 
vemurafenib. In July 2014, GlaxoSmith- 
Kline stopped a combined phase III trial of 
trametinib and dabrafenib early because the 
drugs had obtained increased survival ahead 
of its target. “We now have two winning strate- 
gies,” says Caroline Robert, head of dermatol- 
ogy at the Institut Gustave-Roussy in Paris. 

But BRAF is not the only important driver 
mutation in melanoma. Another mutation, in 
the NRAS gene, is found in approximately 20% 
of metastatic melanoma patients. Drug com- 
panies have struggled to find compounds that 
effectively target the mutated NRAS protein, 
however, so they have focused instead on the 
pathways NRAS activates, including MAPK. 
Indeed, says Sosman, inhibiting MAPK “is 
probably not enough, but it needs to bea com- 
ponent in the strategy”. 

In July 2014, French researchers reported 
another mechanism of resistance to targeted 
therapies for melanoma’. They identified a 
cluster of proteins called elF4E, which regu- 
lates protein synthesis. Tumours that respond 
to anti-BRAF drugs have low levels of eIF4F, 
and those that have developed resistance to 
these drugs have more. “Understanding this 
nexus is critical to overcoming resistance to 
cancer therapy,’ says Robert, one of the study's 
authors. The team has identified compounds 
that inhibit eIF4F and enhance the effective- 
ness of vemurafenib in mice with melanomas. 

“It's an interesting target downstream of 
many mechanisms of resistance to BRAF,” says 
Sosman, “and it’s exciting that a potential drug 
might be able to inhibit this effect” 


IMMUNE RESPONSE 

Long before targeted therapies were possible, 
biomedical researchers had tried using the 
immune system to fight cancer. In the 1990s, 
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instead of applying an accelerator to the 
immune system, they tried lifting the brakes by 
blocking the action ofa protein called CTLA-4, 
which keeps the immune system's T cells in 
check. CTLA-4 normally has a beneficial role 
in preventing the immune system from attack- 
ing normal tissue. But it is such an effective 
brake that it also stops T cells from destroy- 
ing cancer cells. In 1996, a team led by James 
Allison, now at the University of Texas MD 
Anderson Cancer Center in Houston, showed 
that injecting mice with an antibody that 
blocks CTLA-4 could inhibit tumour growth”. 

These findings eventually led to the devel- 
opment of the drug ipilimumab (Yervoy), a 
monoclonal antibody that acts as a ‘checkpoint 
inhibitor by binding to the CTLA-4 protein 
and stopping it from applying the brake. Ipili- 
mumab was the first drug to extend the lives 
of patients with metastatic disease’’. In a large 
phase III trial of 676 patients with late-stage 
melanoma, those given ipilimumab survived 
on average for 10 months’? — almost 4 months 
longer than those given another experimental 
treatment. The FDA approved ipilimumab for 
the treatment of metastatic melanoma in 2011. 

In 2013, a follow-up analysis of 12 studies 
involving more than 1,800 patients given ipili- 
mumab showed that 22% of patients survived 
for 3 years or longer, and some were approach- 
ing 10 years. Checkpoint inhibitors represent 
“a paradigm shift, probably the most important 
discovery in the field”, says Ribas. 

The trouble with ipilimumab is its toxicity. 
Releasing the brake on T cells enables them 
to attack not only cancer, but also normal 
cells in the skin, colon, endocrine system, eye 
and elsewhere, says Sosman, who conducted 
some of the ipilimumab studies. Using the 
drug requires vigilance from hospital staff to 
manage the side effects, and patients may be 
given steroids or even have the treatment dis- 
continued, depending on the severity of the 
side effects. 

Researchers have identified several other 
checkpoint inhibitors that also release the 
brake holding back T cells, but with less toxic- 
ity. Patients with metastatic melanoma often 
have high levels of a protein called PD-L1. 
When PD-L1 binds to a protein called PD-1, 
which is expressed on T cells, it allows cancer 
cells to hide from the immune system. Studies 
have shown that drugs that target these two 
proteins can shrink tumours. 

Ribas and Robert recently led trials that 
used an antibody called pembrolizumab 
(also known as MK-3475) to target PD-1. 
The tumours shrank or disappeared in 52% 
of patients with metastatic melanoma who 
received the drug’’. Another study" found that 
pembrolizumab could slow tumour growth in 
patients who had stopped responding to drugs 
that target CTLA-4. Nearly 90% of those who 
responded to the drug saw their tumours 
shrink or disappear in six months. 

“We see patients who have large, bulky 


Scanning electron micrograph showing a blood vessel providing red blood cells (red) to a melanoma. 


melanomas, tumours that two or three years 
ago if they said they didn't want to be treated, I 
would have said OK; says Ribas. “But with this 
antibody that releases the PD-1 brake, all of a 
sudden their tumours start melting away with 
limited side effects.” 

The FDA approved pembrolizumab 
(Keytruda) in September 2014. This is the first 
drug targeting PD-1 or PD-L1 to be approved 
in the United States, although Japan had already 
approved the anti-PD-1 drug nivolumab 
(Opdivo) in July. Anti-PD-1 drugs have been 
developed at a phenomenal speed, taking 
just three years from the first clinical trials to 
approval, says Ribas. 


BETTER TOGETHER 

Now that targeted drugs and immunotherapy 
have been established, the next development 
may bea combination of the two. Doctors can 
examine a tumour’s biological traits and pick 
the best antibody or combination of drugs 
to attack it. For example, says Ribas, PD-L1 
may be an important biological marker that 
will enable oncologists to identify patients 
who will respond best to pembrolizumab. In a 
large ongoing phase I study, almost half of the 
PD-L1-positive patients responded to pem- 
brolizumab treatment, compared with only 
13% of patients with PD-L1-negative tumours. 

Drug companies are enthusiastic about 
immunotherapy because these drugs seem to 
be beneficial in several different types of can- 
cer. Many of these checkpoint inhibitors are 
being tested in other cancers’, including renal 
cell carcinoma, lymphomas, lung cancer and 
breast cancer. Although a smaller fraction of 
these patients respond to immunotherapy, the 
responses seem to last longer. 

Ultimately, oncologists aim to combine 
the two treatments to produce a more potent 
effect. Using CTLA-4 and PD-1 inhibitors 
together could further boost T-cell activity by 
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releasing the brake at several points during the 
T cell’s interaction with melanoma cells. 

But combining targeted therapy with 
immune therapy might be even more power- 
ful. Targeted drugs could wipe out one type of 
cancer cell and force it to adjust by developing 
new mutations. This would expose them to 
T cells that have had their brakes released to 
finish the job. 

Today's therapies cannot help everyone with 
advanced melanoma, but physicians now have 
a choice of drugs to target different forms of 
melanoma, and researchers are developing the 
tools to match patients to specific treatments. 
“After more years of doom and gloom than I'd 
care to count, we've had this amazing trajec- 
tory that doesn’t seem done yet; says Flaherty. 
“Our confidence keeps rising as our patients 
keep surviving.” m 


Hannah Hoag is a freelance science writer 
based in Toronto, Canada. 
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MELANOMA 


lo hiding in the dark 


Melanoma is most common in light-skinned people, but it can also afflict those with darker 
pigment. Finding out why would help to explain the disease’s origins. 


BY SUJATA GUPTA 


hen Jacqueline ‘Jackie’ Smith was 
19, she spotted a large, irregular 
mole along the right side of her 


bikini line. Concerned, she went to the doctor 
and had it removed. The biopsy results came 
back normal, but a few years later, a hard, 
almond-sized growth appeared in the same 
area. “If I had stretch pants on you could see 
the lump,” says Smith, a doctoral student in 
sociology at Syracuse University in New York. 
Doctors thought it was an infection and put 
her on antibiotics. Yet the lump remained. 

A couple of years later, Smith went to the 
doctor again to have the lump removed. This 
time, the biopsy led to a diagnosis of mela- 
noma. The lump was a lymph node filled with 
cancerous cells. “I was told it would be a mira- 
cle if I lived another 5 years,” she says. 


Smith would just be another melanoma 
statistic except she stands out in an important 
way: she's black. Melanoma rates have jumped 
in white people over the past 30 years, but they 
have stayed flat in people of colour. A white 
person in the United States has a 1 in 50 chance 
of developing melanoma, compared with just a 
1 ina 1,000 chance for a black person. 

Darker skin contains more melanin, a pig- 
ment that protects against ultraviolet rays. Most 
melanomas in white people can be linked to 
mutations caused by sun exposure’, whereas at 
least half of melanomas in black people occur 
on areas not exposed to the sun’. But although 
melanoma in dark-skinned people is rare, it’s 
highly lethal. The five-year survival rate of an 
African American diagnosed with melanoma 
is 73% compared with 91% in Caucasians. 

Most melanoma research is done on white 
people, so the reasons for this disparity are 


unknown. Researchers still don’t know what 
causes melanoma in people with dark skin. As 
a result, it is unclear whether treatment should 
differ according to skin colour, or whether pre- 
vention messages that focus on sun protection 
are appropriate for black people. Part of the 
problem is designing a study that classifies 
people by skin colour. The usual ethnic group- 
ings, such as Hispanic, don’t work because 
some Hispanic people have pale skin, whereas 
others are dark. “To put them all into one bas- 
ket and to treat them as one risk group is silly,” 
says Dennis Hughes, a paediatric oncologist at 
the MD Anderson Cancer Center in Houston, 
Texas. “But that is exactly what we do” 


A WHITER SHADE OF PALE 

The humans who originated under the hot 
African sun some 200,000 years ago were 
almost certainly very dark — the melanin was 
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Bob Marley died from a brain tumour that arose from acral melanoma in his big toe. 


a natural sunblock that prevented the sun’s 
ultraviolet rays from penetrating deep into 
the body and causing radiation damage. But 
it meant they needed to spend considerable 
time outdoors being exposed to the sun to 
synthesize enough vitamin D, which protects 
against osteoporosis and could help to prevent 
autoimmune and inflammatory diseases. But 
as humans began migrating out of Africa to 
dingier climes in East Asia and Europe, their 
skin gradually lightened — a change that led to 
more rapid vitamin D synthesis, but increased 
the risk of skin cancer. 

Some of these changes in pigmentation can 
be traced to mutations in the MCIR gene, 
which encodes a protein called melanocortin 1 
receptor that controls the type of melanin syn- 
thesized in the skin. When the protein is active, 
it produces a dark pigment known as eumela- 
nin that provides sun protection and helps 
with DNA repair. But 


mutationsin the gene “4 org] and 
panied i ile mucosal 
so me Docy procuces melanomas 
pheomelanin, which 
: clearly havea 
is abundant in peo- . ‘ 

ee pea different biology 
ple with fair skin, : 

to those linked to 


freckles and red hair. 
People of all colours 
produce both types 
of melanin, just not in the same quantities. 
Spending time in the sun prompts the skin 
to synthesize new melanin. For those with skin 
rich in eumelanin, this typically results in a tan. 
But for many pheomelanin-rich white people, 
burning and blistering is more common — and 
the risk of melanoma jumps for every blister- 
ing sunburn experienced during childhood’. 
But pheomelanin can cause cancer even 
in the absence of ultraviolet light, says David 
Fisher, director of the melanoma programme 
at the Massachusetts General Hospital Cancer 
Center in Boston. He has shown that mice 


sun exposure.” 


bred with the equivalent of red hair and fair 
skin develop melanomas at much higher rates 
than ‘black and albino mice (which lack mela- 
nin altogether). So although people with dark 
skin produce this dangerous melanin in much 
lower quantities than white people, it could 
explain why they still occasionally develop skin 
cancers, Fisher says. 


BOB MARLEY’S BIG TOE 

In the summer of 1977, Jamaican reggae singer 
Bob Marley was playing soccer in France when 
he injured his right big toe. When the wound 
festered, a doctor removed the toenail. Then 
Marley re-injured the toe during another soc- 
cer game. A new wound appeared. Marley 
went to see another doctor who, shocked by 
the toe’s atrophied appearance, conducted 
a biopsy and diagnosed Marley with mela- 
noma. The doctor advised amputating the toe 
to prevent the cancer from spreading, but Mar- 
ley refused on religious grounds. The cancer 
spread, and in 1981, just four years after the 
initial injury, the dark-skinned singer died of 
a brain tumour. He was 36. 

Marley had acral melanoma, a subtype that 
appears on the palms and soles of the feet, and 
under the nails — areas that have little or no 
sun exposure. Related melanomas can appear 
inside mucous cavities, such as the vagina or 
the mouth. Fewer than 5% of melanomas are 
acral or mucosal, but they account for more 
than half the melanomas found in black peo- 
ple’. That’s because dark-skinned individuals 
are less susceptible to melanomas related to 
ultraviolet light, so a greater proportion of their 
melanomas have nothing to do with the sun. 

Acral and mucosal melanomas “clearly have 
a different biology” to those linked to sun 
exposure, says Jeffrey Sosman, an oncologist at 
Vanderbilt University in Nashville, Tennessee. 
Scientists now need to work out what causes 
those melanomas — and how to treat them. 
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DEVELOPING EARLY 

Jackie Smith had her almond-sized lump 
treated at the Moffitt Cancer Center in Tampa, 
Florida, which is near her parents’ home. Sur- 
geons excised the cancerous lymph nodes 
and radiated the tumour site, and gave Smith 
interferon, an immune therapy that requires 
patients to give themselves regular injections 
for up to a year. The drugs made Smith feel like 
she had a bad case of flu. Her teeth chattered 
constantly and she developed lockjaw from 
the antinausea medication. She had to put her 
doctorate on hold. 

These days, tumours of patients with 
advanced-stage melanomas are sometimes 
genetically sequenced to help determine the 
best treatment. For instance, 60% of tumours 
on sun-exposed areas of skin have mutations 
in the gene BRAF, for which targeted drugs are 
available*. But most acral and mucosal mela- 
nomas have no known genetic cause, making 
treatment more difficult. 

The immune therapy that Smith received 
has only become possible in the past decade. 
Sosman has found that such therapies, which 
help a patient's immune system to fight the 
cancer, seem to be most effective in treating 
melanomas with a high number of genetic 
mutations — that is, those arising from sun 
exposure. That makes sense, he says, because 
mutations probably create abnormal proteins 
that the immune system recognizes as foreign. 
But that means immune therapies may be 
less effective at snuffing out non-sun-related 
tumours, such as those often found in dark- 
skinned people like Smith. 

It's impossible to know what caused Smith's 
cancer or why her treatment worked, especially 
as her tumour was not sequenced. Sun expo- 
sure could bea culprit, as Smith, despite her 
dark skin, is prone to burning. But her surgeon 
at Moffitt, Vernon Sondak, suggests another 
possibility. He wasn’t able to determine the 
primary site of Smith’s tumour, but he thinks 
it may have arisen from the odd-looking mole 
she had removed when she was 19. That fits 
with data showing that melanomas have been 
rising in children and teens. 

The rise is greatest in white teenage girls, 
as these are frequent users of sunbeds, but a 
slower rise has also been observed in younger 
children. Although fewer than 5% of mela- 
nomas in the United States appear in adults 
with dark skin, the figure is much higher in 
children. One study found that almost 18% of 
melanoma patients aged between 1 and 4 were 
non-white’. The implications for Smith’s case 
are clear. “Maybe this is something that started 
when she was much, much younger and just 
took many years to show up,’ Sondak says. 


DELAYED DIAGNOSIS 

Now, seven years after her diagnosis, Smith is 
just a few months away from finally completing 
her doctorate. Life has almost returned to nor- 
mal. But partly because of her late diagnosis, 
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she still suffers from some problems. She has 
periodic swelling, called lymphoedema, in her 
right leg, caused by the removal of the lymph 
nodes in her groin. She has to wear a compres- 
sion stocking, and wearing heels can be dif- 
ficult because her feet swell. 

Such late-stage diagnoses are common in 
people of colour. In 2006, when Robert Kirsner, 
head of dermatology at the University of 
Miami's Miller School of Medicine, compared 
the stage of diagnosis among nearly 1,700 
white, black and Hispanic patients in Miami- 
Dade County in Florida, he found something 
troubling. Only 16% of whites were diagnosed 
after the tumour had begun to metastasize, but 
that jumped to 26% in Hispanics and 52% in 
blacks® — a pattern Kirsner says could explain 
the higher mortality rates from melanoma 
among minorities. His subsequent work 
suggests that the delays in diagnosis may be 
socioeconomic or related to inadequate public- 
health campaigns. Patients and clinicians often 
don't even realize that dark-skinned people can 
get melanoma, he says. 

To address this disparity, the American 
Academy of Dermatology (AAD) convened a 
working group of skin-colour specialists and 
issued fresh guide- 
lines earlier this year’. 


“The melanoma 

risk for black They suggested that 
gantosa all non-Caucasians 

im a than for conduct a thorough 


skin exam once a 
month, paying spe- 
cial attention to the 
palms of the hands, 
the soles of the feet, 
under the nails, and body cavities. They also 
reminded people of colour to follow the same 
stringent sun safety measures as white people: 
seek shade whenever possible, wear protective 
clothing and hats, apply sunscreen regularly, 
and avoid sunbeds. “Even though their risk is 
lower than very fair-skinned Caucasians, it’s 
not zero,’ says Henry Lim of the Henry Ford 
Hospital in Detroit, Michigan, who led the 
AAD group. 


fair-skinned 
Caucasians, but 
it’s not zero.” 


COLOURING THE ADVICE 

Will such stringent guidelines lower mela- 
noma rates in people with dark skin and help 
reduce the ethnic disparities in health out- 
comes? Research and prevention messages 
for melanoma are based almost exclusively on 
whites, so it’s not at all clear. 

The problem starts with the basics, Kirsner 
says. The standard self-examination instruc- 
tions tell people to look out for moles that 
are asymmetric, have irregular borders, are 
unevenly coloured, are larger than 6 mm in 
diameter, or are changing. But these guidelines, 
says Kirsner, “are based on white people”. Can- 
cerous moles on dark skin may look different, 
he explains. 

What's more, studies of melanoma in peo- 
ple of colour have largely focused on ethnicity, 


rather than skin colour. Giving advice to ‘His- 
panics; ‘African Americans’ or ‘Asians’ doesn't 
make much sense because someone's ethnic- 
ity says little about their skin colour, which is 
the main determinant of melanoma risk, says 
Nina Jablonski, an anthropologist at Pennsyl- 
vania State University in University Park, who 


specializes in the evolution of skin colour. 
Yet this is precisely what happens. The AAD 
report’, for instance, defined Caucasians 
as “non-Hispanic individuals of European 
descent”. Everyone else — from lightly pig- 
mented Asians and Asian Indians to Africans 
— were lumped together as “people of colour”. 
“That's a tremendously heterogeneous group,” 
Jablonski says. 

There is little doubt that advising a fair- 
skinned redhead to treat the sun as a car- 
cinogen is scientifically sound, but it’s less 
clear for people of colour. Given the rarity 
of melanomas in dark-skinned individuals, 
coupled with their high proportion of acral or 
mucosal melanomas, the odds of them devel- 
oping melanoma from excessive sun exposure 
are slim. “Do we need to give them the same 
photo-protection advice?” asks Lim. “Probably 
not.’ The challenge, he says, is coming up with 
personalized guidelines that are easy to follow 
— but this could take several years, so the mes- 
sage will remain the same for now. 

Australia and some European countries 
have already personalized skin protection 
advice based on skin colour, however. Dark- 
skinned individuals are generally told that 
limited sun exposure is fine, even healthy, as it 
promotes vitamin D synthesis. In the United 
States, dark-skinned people are advised to take 
vitamin D supplements instead. 
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Education and outreach may be unable to 
help much too. When dark-skinned individu- 
als and white people present with tumours of 
the same size, the melanoma in the person with 
dark skin is more likely to have metastasized. 
This suggests that people with dark skin may 
be predisposed to more severe forms of mela- 
noma’, making early detection difficult. 

The first step to understanding what's going 
on, says Esteban Parra, a molecular anthropol- 
ogist at the University of Toronto in Canada, 
is to measure skin colour objectively’. These 
quantitative skin colour scores could then be 
matched to tumour sequencing studies to dis- 
tinguish between genetic variants that increase 
skin-cancer risk by altering pigmentation and 
variants that increase risk but have no bearing 
on pigmentation. 

Parra points to a pair of studies that exem- 
plify this approach. Researchers looked at 12 
variants in 4 genes known to be involved in 
pigmentation to determine if and how those 
genes altered skin colour in Japanese people. 
The researchers assessed pigmentation by 
using a spectrophotometer, which measures 
the reflectance of skin, and found that vari- 
ants ofa gene known as OCA2 lightened skin 
colour”. 

This year, the same researchers found that 
these skin-lightening variants also increased 
the likelihood of developing skin cancer", ena- 
bling them to draw a clear line from genetic 
variation to skin colour to cancer risk. “It will 
be fantastic if more people start including 
quantitative measures of pigmentation in their 
research,” Parra says. 

Until then, the best advice is for people of all 
colours to get to know their skin, and to have 
it checked if they see something amiss. Jackie 
Smith credits her doggedness for saving her 
life. “We all have this sense about something 
not being right,” she says. “I had that sense but 
I was also really happy when the doctor said, 
‘Oh this is nothing to worry about’.” But she 
still felt uneasy and went back to the doctor, 
and it paid off. “I’m still here,’ she says. m 


Sujata Gupta is a freelance writer based in 
Burlington, Vermont. 
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PROTECTION | 


The sunscreen pill 


A tablet that protects against sunburn is an attractive idea, but the science is patchy. 


BY ERIN BIBA 


true: a pill that has all the protective prop- 
erties of sunscreen without the bother of 
slathering yourself in lotion or remembering to 
re-apply it. Over the years, research into sucha 
pill’ has yielded a slew of over-the-counter sup- 
plements that claim to fight sun damage to the 
skin, mostly based on the fact that they con- 
tain antioxidants. But the US Food and Drug 
Administration (FDA) doesn't regulate supple- 
ments, so none of these products have needed 
to prove their effectiveness. Despite much 
research and a plethora of claims by manufac- 
turers, the problems of moving antioxidants 
through the human body make it tricky to 
develop a pill that can replace sunscreen lotion. 
Many of the current pills are based on an 
antioxidant-rich extract from the tropical 
fern Polypodium leucotomos, although a UK 
researcher is trying to patent an extract from 
algae found on coral. And there are reasons to 
suppose that antioxidants might help. Expos- 
ing the skin to ultraviolet radiation triggers the 
formation of certain reactive oxygen species 


I t sounds like a lazy sunbather’s dream come 


known as free radicals that damage skin cells 
and can ultimately lead to malignancy. Antioxi- 
dants are known to destroy free radicals in the 
body and on the skin. The hard part is getting 
the antioxidants from the stomach to the skin. 

Salvador Gonzalez, a dermatologist based in 
Madrid, Spain, who works as a consultant with 
the Memorial Sloan Kettering Cancer Center 
in New York, has been studying the fern extract 
since the early 1990s. But making it work effec- 
tively in pill form is difficult, he says. 


LESS RADICAL 

Scientists have tested the extract against vari- 
ous diseases and disorders such as skin cancer. 
They have injected it, applied it topically to the 
skin, and given it to patients in pill form. All 
these methods revealed at least some reduc- 
tion in the amount of free radicals on the skin’. 
But pills were the least beneficial route, largely 
because of the way the body’s metabolism 
interacts with the extract. 

“Tf you think about taking a pill by mouth, 
it has to go through multiple steps,’ explains 
Henry Lim, a dermatologist at the Henry 
Ford Hospital in Detroit, Michigan. “It has to 
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be absorbed, go through the blood and then 
through the liver before it gets to the skin” This 
is especially problematic for an antioxidant- 
based sunscreen pill because antioxidants, 
by their very nature, are unstable and tend to 
break down before they reach the target. 
There is some evidence that antioxidants do 
reach the skin, however. A small 2004 study 
in which people were given oral doses of the 
fern extract after exposure to ultraviolet light 
found that their skin was less red and had fewer 
sunburnt cells than subjects not given the 
extract**, Anda 1997 study looked for markers 
of cell damage caused by exposure to ultravio- 
let light in ten volunteers who ingested the fern 
extract’, The extract boosted the ability of the 
immune system to repair the damage caused 
by sunlight, and reduced the reaction of the 
skin cells to ultraviolet that results in sunburn. 
They also exposed subjects to twice the thresh- 
old of ultraviolet needed to cause sunburn 
and found that damage in those given the fern 
extract decreased by 84%, whereas it increased 
by 217% in subjects not given the extract. The 
results were not statistically significant, but the 
researchers suggested that larger studies may 


SUSAN BURGHART 


show that the fern extract protects the skin. 
Despite these data, Lim — who has worked 
as a consultant to Ferndale Healthcare, a sup- 
plement manufacturer in Detroit that makes a 
fern-based sunscreen pill — says no dermatolo- 
gist would currently recommend using a pill 
instead of sun lotion. “None of the pills at this 
moment are 100% successful,” he says. 


LOOSE REGULATION 

One problem in assessing the pills currently 
on the market is that they are deemed to be 
supplements, not medicines, so they are not 
regulated by the FDA. As long as the manufac- 
turer makes no false or misleading claims, and 
there is no immediate health threat, the makers 
can sell whatever supplements they want — it’s 
up to the consumer to decide whether they are 
worthwhile or not. 

In the United States, supplements are reg- 
ulated more loosely than sunscreen lotion, 
which is viewed as both a cosmetic and a 
drug. Cosmetics are regarded as anything that 
is applied to the body for cleansing or beautify- 
ing, and a drug is something intended for treat- 
ment or prevention. Because sunscreen lotion 
is both, it must follow the regulations for each 
type of product. Cosmetics don’t require FDA 
approval, but drugs do, so sunscreen lotion is 
held to a higher standard than normal mois- 
turizer — and also higher than supplements. 

In August 2013, the American Academy 
of Dermatology released a statement on oral 
sunscreens declaring that there is “no scien- 
tific evidence that oral supplements alone can 


provide an adequate level of protection from 
the sun’s damaging ultraviolet rays.” 
Dermatologists say that a pill may well be 
a reasonable addition to a cream-based sun 
protection regimen, which should also include 
wearing long clothing and a hat, and staying 
in the shade. In a series of studies Gonzalez 
has conducted over the years, he was able to 
achieve a sun protection factor (SPF) of just 2 
from the fern-based pill, compared with SPFs 
ranging from 15 to 50 for sun creams on the 
market in the United States. “Increasing the 
amount of antioxidants in a pill to a level that 
could robustly block sun damage would prob- 
ably cause unwanted side effects,” he says. 


MAKE TAN 
The most promising example of a non-topical 
sunscreen is a prescription drug created by 
the company Clinuvel Pharmaceuticals based 
in Melbourne, Australia. Known as Scenesse 
(afamelanotide), and currently awaiting FDA 
approval for marketing in the United States, 
it is a chemical analogue of a naturally occur- 
ring hormone, a-melanocyte-stimulating 
hormone, that is released into the body on 
exposure to ultraviolet radiation. The hor- 
mone — and the drug — triggers skin cells to 
release the dark pigment melanin, as they do 
to create a tan when skin is exposed to the sun. 
Tanning creates a natural shield against 
ultraviolet radiation. The melanin acts as a 
filter, screening out some of the wavelengths 
of sunlight that induce the formation of dan- 
gerous free radicals. Lim, who consulted with 
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Clinuvel while they were developing the drug, 
says that anyone who takes Scenesse would 
eventually become very tanned and, asa result, 
would be much less likely to burn. 

But Scenesse is not marketed at general con- 
sumers — the FDA approval would be for use 
as a prescription drug to treat people with dis- 
eases such as vitiligo that make them extremely 
photosensitive. Clinuvel hopes the drug can 
also be used to treat people with photoderma- 
tosis, a disorder that causes mild-to-severe skin 
rashes after exposure to ultraviolet radiation. 

If approved, Scenesse will not be adminis- 
tered as an oral pill, but as an implant the size 
ofa grain of rice that is injected under the skin. 
Tanning from the injection will start within 
two days and lasts up to two months before 
another injection is needed. However, because 
it is injected, and is only indicated for severe 
photosensitivity disorders, it is impractical 
as an everyday treatment for people who lack 
sun-sensitivity diseases. The injection would 
protect patients from severe sun damage, but 
Clinuvel actively discourages people from 
thinking of the drug as a sunscreen pill. 

Another lead in the search for a pill to pre- 
vent sun damage comes from Paul Long’s lab 
at King’s College London — and it’s based on 
compounds made by algae that live on coral. 
Over the past five years, Long has been study- 
ing mycosporine-like amino acids (MAAs), 
which are naturally occurring sunscreens pro- 
duced by organisms that live in clear, shallow 
water and so are exposed to high levels of ultra- 
violet radiation. Long discovered that the algae 
living inside coral produce MAAs and pass 
them to the coral they live on. Both organisms, 
and the fish that eventually feed on them, are 
protected by the MAAs, which absorb ultra- 
violet radiation before it can damage them. By 
sequencing the coral’s genome, Long identified 
the genes that encode the pathway that allows 
the coral to take up and use the MAAs. 

Long is trying to patent the ingredient for 
use in pills, but it’s already proving effective in 
other products. In 2012, King’s College Lon- 
don entered into partnership with Aethic, a 
UK skincare company, to commercialize the 
use of MAAs in sunscreen lotions. 

Gonzalez says the research is promising but 
that MAAs will be one of many sun-protec- 
tive compounds derived from nature, none of 
which is fully effective in blocking the sun. So 
in the end, any sun-protection regimen will 
still have to include lotion and a good hat. m 


Erin Biba is a science writer based in New York 
City. 
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lion skin cancers and 76,000 melanomas are diagnosed each 

year in the United States, and, on average, one person dies 
from melanoma every hour. As with most diseases, the best way to 
fight melanoma is to prevent it. Unfortunately, the latest sunscreen 
ingredients that can help to reduce the risk of melanoma and other 
skin cancers have languished for decades awaiting approval from the 
US Food and Drug Administration (FDA). 

The ultraviolet (UV) filters in sunscreen work by absorbing, reflect- 
ing or scattering the UV light emitted by the Sun. UVA radiation, 
which represents roughly 90% of UV radiation, can accelerate skin 
ageing, cause skin damage and create a risk of skin cancer by damaging 
DNA. The other component, UVB, leads to sunburn and also increases 
the risk of skin cancer. The most effective protection blocks both UVA 
and UVB. But ingredients delayed by the FDA approval process would 
provide additional options, especially for UVA 
protection. 

The active ingredients used in sunscreens 
are regulated by the FDA as drugs. But the FDA 
has not approved an over-the-counter sun- 
screen ingredient since 1999. In 2002, it created 
a new pathway to market for non-prescription 
ingredients, such as sunscreens, that allowed 
manufacturers to use data from other countries 
to establish that a product is safe and effective. 
To qualify for this ‘time and extent application’ 
(TEA) process, the company must establish that 
a product is approved in at least one comparable 
country and that it has been in use for at least 
five years in sufficient quantity. The TEA process 
was designed to streamline the review of new 
ingredients, and the FDA said that it expected to 
complete the evaluation of sunscreen ingredients 
within 90-180 days. 


A ccording to the Skin Cancer Foundation, more than 3.5 mil- 


SLOW PROGRESS 

Unfortunately, it has not gone according to plan. After more than 
12 years, the FDA has still not approved a single sunscreen ingredient 
through the TEA process. This means that Americans still lag behind 
the rest of the world regarding access to the latest UVA filters — even 
though these ingredients now have a long history of safe use in Europe, 
Australia and other parts of the world. 

There are currently eight ingredients waiting for a decision from 
the FDA, some of which were submitted for approval as long ago as 
2002. Bemotrizinol, for example, has been languishing in the TEA 
queue since 2005, despite being approved for use in the European 
Union (EU) in 2000. 

In the past few months, some manufacturers have received letters in 
response to their applications, but for many this was the first feedback 
they had received. In the letters, the FDA consistently argues that the 
products must undergo additional safety testing. 

The FDA seems to be backtracking on the TEA process. At a recent 
meeting of its Nonprescription Drugs Advisory Committee about 
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Protect the USA from UVA 


The United States does not have access to the latest sunscreens. The 
Sunscreen Innovation Act could set that right, says Michael J. Werner. 


pending sunscreen ingredients, the FDA argued that the approval 
in a comparable jurisdiction, such as the EU, and experience of safe 
marketing is insufficient to support the approval ofa sunscreen ingre- 
dient in the United States. Rather, the FDA would like companies to 
perform additional safety testing unique to the United States. This 
might include studies of dermal safety, ‘bioavailability, carcinogenicity, 
developmental and reproductive toxicity, and toxicokinetics. The FDA 
acknowledged that some of these tests would take at least two years. 

The FDAs sluggish regulatory response prompted the forma- 
tion of the Public Access to Sunscreens (PASS) Coalition in March 
2013, for which I am a policy adviser. The coalition’s mission is to 
work with the FDA, Congress, the White House, health providers 
and consumer organizations to establish a regulatory pathway for 
the timely and transparent pre-market review of new, safe and effec- 
tive sunscreen ingredients. The coalition, which comprises cancer 
research organizations, academic scientists and 
sunscreen manufacturers among others, thinks 
that the FDA should ensure it is adopting a risk- 
based approach, taking into account the known 
risk of skin cancer and melanoma, and balanc- 
ing the benefits of sunscreen protection against 
the potential risks. Additional testing should be 
required only if international experience, adverse 
event reporting, or other scientific information 
reveals that the product's risk profile demands it. 

Efforts by PASS led to the introduction of 
the Sunscreen Innovation Act in March 2014. 
The act reforms the TEA process to establish a 
predictable and transparent process for the 
review of sunscreen ingredients to ensure that 
safe and effective products reach the market 
as soon as possible. It maintains the existing 
requirements for TEA products but ensures that 
the FDAs safety and effectiveness review is completed within statutory 
deadlines in a transparent way, including an opportunity for public 
comment. The act calls for a formal evaluation of the process and 
requires reports on the FDA’s progress in processing applications to 
be made available to the public. 

The bipartisan act passed the US Senate unanimously in Sep- 
tember 2014 and the US House of Representatives unanimously in 
November 2014. It is expected to be signed by the President later this 
year. 

The PASS coalition continues to fight for the enactment of the Sun- 
screen Innovation Act and to ensure that safe sunscreens reach the 
market as soon as possible. This provides a responsible solution to a 
problem that is exacerbating a public-health crisis. Giving Americans 
more choices and promoting sunscreen innovation will go along way 
towards preventing a deadly disease. m 


Michael J. Werner is a partner with Holland and Knight in 
Washington DC and a policy adviser for the Public Access to 
Sunscreens Coalition. 

e-mail: michael.werner@hklaw.com 
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